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De Gabor 


D. Gabor, Professor of Applied Electron Physics in the Imperial College of 
Science and Technology, London, was born in Budapest, Hungary, and studied 
electrical engineering in Budapest and in Berlin, where he acquired the Degree 
of Dr.-Ing. in 1927. After seven years in a research laboratory of Siemens & Halske, 
Berlin, he worked from 1934 until 1948 in the research laboratory of the British 
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elected Fellow of the Royal Society in 1956. His principal scientific work relates 
to high speed oscillography, gas discharges, physical optics, communication theory 
and techniques, and all types of electronic devices. His present interests include 
plasma physics and “‘intelligent”’ machines. 


T is now a little over ten years since the appearance 
of Shannon’s CommMuNICcATION THroRY, and it is 
time to review the harvest it has brought in, scientific 
d practical. 
For one thing, Inrormattion Turory (as it was soon 
led) has now become mathematically respectable. Pure 
athematicians like Doob will no longer query “whether 
ie mathematical extensions of the author (Shannon) are 
ways strictly honorable.” The work of McMillan, 
einstein, Khinchin and of Shannon himself has made 
SFORMATION THEORY rigorous—and almost unreadable to 
ugineers! It is curious to remember that thirty years 
arher SraristicaL Mrcuanics had a similar adventure, 
nt with a difference. Outstanding mathematicians— 
Yiener, Birkhoff, v. Neumann—could not accept the 
ppy-go-lucky theories of the physicists Boltzmann and 
‘bbs, and they put SratisticAL MECHANICS, in particular 
1e Ergodic Theorem, on a rigorous mathematical basis. 
‘he only difference was that the physicists refused to 
uke any notice of this. By the time the mathematicians 
ad finished the ergodic theory of continuous processes, 
re physicists had discovered quantum statistics, and 
»fused to give physical significance to anything but what 
ne mathematician Rényi calls “entropies of zero order.” 
‘hey never bothered to make Lebesgue-Borel measure 
reory a part of the physical curriculum. 

With so much effort of the best brains spent on founda- 

ons and on coding, it is not surprising that the practical 
chievements of CoMMUNICATION THEORY fall a little 
nort of what could be expected ten years ago. It is not 
nfair to say that most progress has been made where 
pmmunications were already near-perfect. We now have 
omplicated near-optimum codes which will operate with 
n error of 10 “—10°, but the saving in bandwidth is 
ather less than 10 per cent compared with a much simpler 
rror-checking code operating under the same conditions. 
in the other hand, hardly any progress has been made in 
re filtering of speech, or in the transmission of speech 
r pictures in reduced wavebands, where the potential 
wing is not measured in per cents but in orders of 
ragnitude. 
The lack of progress in filtering beyond the early work 
f Wiener and of Lee is due to a great extent, of course, 
5 the enormous mathematical difficulties of a nonlinear 
Iter theory. But it must be admitted that even if we had 
ach a theory, it would not be much good until we know 
hat it is that we want to filter out. The communication 
ystems which are most backward are those which ulti- 
vately serve the human receptor: the ear or the eye. 
‘ommunication between machines can be (with rare 
xceptions) digitalized, and is adequately covered by 
resent efforts. But until we know more exactly what it 
‘that a human ear picks out in a conversation with a 
oisy background, or what the eye picks out “‘at a glance” 
1 a picture, no serious progress is possible, except per- 
aps by lucky inventions. 

We still do not possess in speech perception anything 
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comparable to the Maxwell-Konig theory of color per- 
ception, or Schrédinger’s beautiful variant of it, based 
on the minimum perceptible interval in a three-dimen- 
sional non-euclidian metric. In view of Edwin Land’s 
recent observations, I do not contend that these are 
solid foundations which will stand for all times, but I 
wish we had at least that much solid basis for a theory of 
speech perception. We can guess from the work of Walter 
Lawrence that speech-sound perception is six-dimensional. 
Six parameters (larynx frequency and intensity, hiss and 
the frequency of the first three formants) are sufficient 
for synthetizing very human-sounding speech. What is 
required is a mathematical theory of the selector operators 
for these parameters, and empirical work for establishing 
the /imen or smallest perceptible step in this or an equiva- 
lent 6-dimensional space. 

In picture transmission the situation is even more 
complicated. The experiments of the Gestalt psycholo- 
gists have unearthed a confusing wealth of material 
relating to the perception of still pictures, and D. M. 
McKay’s recent work has proved that time and motion 
make the chaos even worse. This may well be the muddle 
which, according to A. N. Whitehead, must precede a 
successful induction, but in the present case one feels 
inclined to ask: “‘Precede by how many hundred years?” 
It may well be that a device which removes noise from 
television pictures or allows transmitting them in a greatly 
reduced waveband cannot be much simpler than the 
human visual cortex. But to believe that no progress is 
possible before we have understood every optical illusion 
and have built it into a solid theoretical framework is 
dangerous defeatism. There is a gap of a million or so 
between the 20 bits per second which, the psychologists 
assure us, the human eye is capable of taking in, and what 
we offer it in a television picture. For my part, I believe 
that at least the first two orders of magnitude out of the six 
ought to be relatively easy going, and could be overcome 
on the basis of such elementary knowledge as that the 
eye fixes in the first line on contours, and samples what 
is in between (grass, hair, leaves, fabrics). This is a chal- 
lenging problem for the signal analyst, who must first 
pioneer the field before statistical communication theory 
can start. To start at the other end, and begin collecting 
conditional probabilities of signals before one knows what 
the signals really are is one of the commonest errors of 
those who have only superficially digested information 
theory. 

I wanted to dwell on the shortcomings of information 
theory rather than on its many positive achievements, 
not out of perversity, but because I believe that if more 
effort is directed into the No-Man’s Land between raw 
sensory data and the distinguishable signals which are 
the starting point of the statistical theory, the second 
decade of INForMATION THEORY will be as rich in practical 
improvements in communication techniques as the first 
was in intellectual clarifications. 

—D. Gasor 
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Note on Unique Decipherability’ 


KE. T. JAYNESt 


Summary—We consider an alphabet of a letters, used under 
the restrictions: 1) messages uniquely decipherable into words by 
use of one of the letters as a space mark, and 2) words limited to 
a maximum length of L letters. Although imposing these constraints 
simultaneously may cause a large reduction in the channel capacity 
of the alphabet, neither by itself causes any reduction. Accordingly, 
in the absence of constraints other than 1), an inequality of McMillan 
pertaining to uniquely decipherable messages can be made to be 
an equality. 

Defining ‘‘semi-optimal” transmission by the condition that the 
mean transmission time per word is minimized for a given entropy 
per word, we find the attainable rate of information transmission 
under semi-optimal conditions. Transmission at full channel ca- 
pacity is a special case of semi-optimal transmission. Some general- 
izations and analogies to statistical mechanics are discussed. 


INTRODUCTION 


HE purpose of this note is twofold: 1) to give a 

result related to an inequality of McMillan per- 

taining to unique decipherability, and 2) to illus- 
trate the close relationship between problems of 
information transmission and some of the elementary 
problems of statistical mechanics by use of notation 
borrowed from the latter field. In statistical mechanics 
we find that in principle all thermodynamic properties of 
a system are determined if we can evaluate its partition 
function Z; or better still, log Z, in its dependence on the 
various constraints representing experimentally imposed 
conditions. Similarly, many problems of information 
transmission under constraints are, in principle, solved 
if we can evaluate an appropriate partition function. 
For the calculation of channel capacity, this is equivalent 
to the method described by Shannon. The same math- 
ematical procedure also solves a wider class of problems, 
in which we find the transmission rate under what are 
termed ‘‘semi-optimal” conditions. 


CHANNEL Capaciry UNDER CONSTRAINTS 


We have an alphabet of a symbols, each of which can 
be transmitted in unit time. Let J; be the length (number 
of letters) of word w,, and define the partition function’ 


Zr) = 2.2 (1) 


where the sum is over all words in our vocabulary. A 
given vocabulary (7.e., a specific set of possible words) 


* Manuscript received by the PGIT, Jan. 21, 1959. This re- 
search was supported by the USAF under Contract No. AF 49 
(638)-342 monitored by the AF Office of Sci. Res. of the Air Res. 
and Dev. Command. ; 

+ Microwave Lab. and Dept. of Phys., Stanford University, 
Stanford, Calif. i 

1 BH. Schrodinger, ‘“‘Statistical Thermodynamics,’’ Cambridge Uni- 
versity Press, Cambridge, Eng., chap. II; 1948. 


may be regarded as defining a channel. By a theorem o 
Shannon, the capacity of this channel is the larges 
(actually the only) real root of Z(A) = 1. 

The notion of a “word” is meaningful only if ther 
exists some rule by which a sequence of letters can b 
uniquely deciphered into words. If no such rule exists 
then effectively each letter is a word. The partition fune 
tion then reduces to Z(A) = a2‘, and Shannon’s theoren 
gives the well-known channel capacity (all logarithm 
are to the base 2) | 


C = log a bits/symbol. (2 


A necessary and sufficient condition for unique de 
cipherability into words (UD) is given by Sardinas an 
Patterson.’ McMillan* has given two inequalities impliec 
by UD, which had been noted before’’® under more re 
strictive conditions.’ The first, which perhaps deserve 
to be called the fundamental inequality of noiseless coding; 
theory, is in our notation 7 


ZA: lOgiG) ale (3 


Although, as several authors have shown,** this in 


equality can be derived without any reference to infor 
mation theory, the concepts introduced by Shannon give 
it a simple intuitive meaning. 

Any UD coding method is a system of constraint 
which in some way restricts our freedom in choosing the 
successive letters of a message, and defines a particulai 
channel. Eq. (3) expresses the fact that imposing these con 
straints can never lead to a channel with greater capacity 
than the value (2), which corresponds to complete freedon 
of choice. Thus we conjecture that (3) will be fundamenta 
not only for UD, but also for coding systems designed fo: 
any other objective. In general, a constraint will redue 
channel capacity, and a reasonable measure of the ef 
ficiency of a code is the amount of this decrease. 


_?C. BE. Shannon, “The Mathematical Theory of Communica 
tion,” University of Illinois Press, Urbana, Illinois, p. 8; 1949. 

’ A. A. Sardinas and G. W. Patterson, “A necessary and sufficien 
condition for unique decomposition of encoded messages,’ IRE 
CONVENTION ReEcorp, pt. 8, pp. 104-108; 1953. 

4B. McMillan, “Two inequalities implied by unique decipher 
ability,’ IRE Trans. on Inrormation Tuuory, vol. IT-2, pp 
115-116; December, 1956. 

°L. G. Kraft, “A device for quantizing, grouping and codin 
amplitude modulated pulses,’ S. M. Thesis, Dept. Elec. Eng: 
M.1.T., Cambridge, Mass.; 1949. 

° B. Mandelbrot, “On recurrent noise limiting coding,’ Proc 
Symp. on Information Networks, Polytechnic Inst. of Bklyn; Ney 
NiorkwiNie Yel 955! 

7Given also in A. Feinstein, ‘Foundations of Informatio1 
Theory,” McGraw-Hill, New York, N. Y., pp. 17-23; 1958. 

§M. P. Schiitzenberger and R. S. Marcus, “Full decodabl 
code-word sets,’ IRE Trans. on Inrormation Tueory, vol. IT-5 
pp. 12-15; March, 1959. 
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MeMillan* considers also a strong sufficient condition 
¢ UD, called irreducibility, and shows that for fixed 
ord lengths the strong constraint of irreducibility does 
ot reduce channel capacity below that set by the general 
mnstraint of UD. 

‘Trreducibility ensures UD only if the entire message is 
vailable. In applications to communication systems and 
genetics,’ the most common transmission defect is the 
ne wherein certain parts of the message are simply lost. 
Ithough the probability that this would destroy UD for 
1e entire balance of the message is usually very small,’° 
ie is led to ask for a stronger condition of UD “‘in the 
mall.” Without attempting a precise definition of this 
‘rm, we use it in the rough sense that any reasonably 
ng fragment of a message will still be uniquely decipher- 
ole, except for possible end-effects. 

Golomb, Gordon, and Welch® consider channels with 
xed word length k, and a structure constraint much 
ronger than UD, which ensures UD in the small. Among 
neir results, they show (theorems 6 and 7) that for given 
, in the limit of large alphabets even their strong con- 
raint does not reduce capacity below the value (2). 
Suppose we achieve UD by the usual method of choos- 
ig one of the letters, which we call the “space,” and 
sing it only as the terminating letter of each word; call 
us “spacing.” Spacing is a stronger constraint than 
[IcMillan’s irreducibility, but weaker than that of 
-olomb, Gordon, and Welch (although it still accomplishes 
re alm of UD in the small). We wish to find how much 
re channel capacity is reduced by spacing, and to show 
rat this reduction is in fact zero, independently of the size 
; the alphabet, if spacing is the only constraint. 

Due to the spacing constraint, the maximum number of 
ifferent words of length | is not a’, but only 


(a — 1)". 


ividently, any failure to include all of the short words 
2 our vocabulary will have a further adverse effect on 
nannel capacity, in addition to that imposed by spacing. 
‘herefore we use all possible words of total length 1 < L, 
nd the partition function (1) becomes 

| = Paley Ne b= (eles 
| 28 ar arr 


l=1 


(4) 


Noting that for all real A, Z(A) is a decreasing function, 
nd that Z(\) — L/(a — 1) as\ > log (a — 1), it follows 
nat if L = (a — 1), the exact channel capacity is C = log 
1 — 1). If L > (a — 1), we find from (4) the inequalities 
| log (a — 1) < C < loga. (5) 
| 


he latter inequality in (5) is identical to (3), and it goes 
to an equality in the limit L — . But since log a is 


98. W. Golomb, Basil Gordon, and L. R. Welch, “Comma-free 
des,’ Canadian Jour. Math., vol. 10, no. 2, pp. 202-209; 1958. 
10M. P. Schtitzenberger, ‘On an application of semi-group 
ethods to some problems in coding,’ IRE Trans. on [NroRMa- 
on Tuwory, vol. IT-2, pp. 47-60; September, 1956, 
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just the channel capacity we would have with an alphabet 
of a letters without any constraints, our assertion is 
proved. Stated differently, since (3) must hold for any 
method of achieving UD, in the absence of other con- 
straints, no method of achteving UD can be more efficient, 
as measured by channel capacity, than spacing. 

If L < (a — 1), then (4) leads to the inequalities 


\ C 
O-)<p Sx 


the equality sign holding in the limit a> o. 

Numerical values of C obtained from (4) are given in 
Fig. 1. The trend for different a may seem disconcerting; 
one might argue that we are merely tying up one of the 
letters for special use, and so the loss in channel capacity 
could never exceed that due to removal of a single letter 
from the alphabet. However, the loss in capacity is in 
fact greatest for the large alphabets. 


(6) 


1.0 


0.8 

4 0.6 
Se 

log a 04 

OZ 
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| 10 
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100 


Fig. 1—Reduction in channel capacity due to restriction on maxi- 
mum word length, for various alphabet sizes. 


This situation may be understood as follows. Under no 
restrictions on word length, 1 — ©, we show in the next 
section that operation at full channel capacity requires 
a mean word length (/) = a. The space then occurs with 
the same relative frequency, (1/a), that it would have if 
it were not assigned any special function. This is the 
reason why UD, by itself, need not restrict transmission 
rate; the space is being used just as efficiently as any 
other symbol. However, when a becomes large, the 
effect of fixed L is to force (/) < a. It is this tying up of 
channel time by too frequent repetition of the space 
which actually causes all the decrease in channel capacity, 
and explains the lower position, in Fig. 1, of the curves 
for large a. 


SEMI-OPTIMAL ‘TRANSMISSION 


If the word w; occurs with probability, 


as 
Dor= Z(a) ’ 


(7) 


the rate of transmission (entropy per word) is maximized 
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for a given mean length of word; equivalently, the average 
transmission time per word is minimized for a given 
entropy per word. The average length and entropy per 
word, under these semi-optimal conditions, are given 
by” 


Os rete 
p= ma log Z(X) (8) 
S = log ZX) + a), (9) 


which are parametric equations connecting the quantities 
of interest. From them we can construct the ‘operating 
characteristic” of the channel, in which we plot the time 
rate of transmission, 


eee 

(I) 

as a function of the average word length (/). A family of 

operating characteristics, computed from (4) in the case 

of no restriction on maximum word length, 7.e., from the 
partition function 


bits/symbol, (10) 


Zh) = (2d lee (11) 


is given in Fig. 2 for various alphabet sizes. The range of 
attainable operating conditions consists of all points 
lying below the curve. 


6 T Tee 


| i 10 
<i> 


Fig. 2—Maximum attainable transmission rate as a function of 
average word length, for various alphabet sizes. For each curve, 
operation at full channel capacity occurs at the intersection with 
the dotted line. 


The equation represented in Fig. 2 is found by eliminat- 
ing \ from the above equations: 


Sp epee Gye al 
H = log (I) 4 @ log E = | (12) 


uy), T. Jaynes, “Information theory and statistical mechanics,”’ 
Phys. Rev., vol. 106, pp. 620-630, May 15, 1957; vol. 108, pp. 
171-190, October 15, 1957. 
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It is in the limit (1) > © that we are effectively removing 
one letter from the alphabet, and so H — log (a — 1). — 

From (8), (9), and (10) the condition for maximum 
transmission rate is | 


dH _ _logZ _ 


A ee 


or log Z(A) 0. Under these conditions, we find 
H 4 = C, the channel capacity; thus (8) and (9) 
provide a simple alternative derivation of Shannon’s rule 
for calculating channel capacity.” : 

In the case of the partition function (11), operation a 
full channel capacity occurs when H = \ = C = log a; 
and (8) then gives (J) = a, as previously noted. . 

The case a 32 corresponds to the English language, 
if we consider the space and any five punctuation marks as 
included in the alphabet. The average word length in 
English is far less than 32 symbols, and varies with the 
source. One thousand consecutive words from Shannon’s 
fundamental paper? had a mean length (including the 
space) of 5.9 symbols; while a similar analysis of James 
Michener’s “Sayonara” gave a mean length of only 5.4. 
From Fig. 2, we find that because of Michener’s tendency 
to use short words, unique decipherability by spacing 
costs him 0.3 bit per symbol in information content, 
while Shannon’s loss was only 0.25. ; 

The many additional constraints in English cause the 
actual transmission rate to fall considerably below the 
semi-optimal rate. Taking Shannon’s estimate” of the 
redundancy of English as about 50 per cent, the actual 
operating region of English text would be given roughly 
by the circle in lig. 2. From this we see that 1) only about 
15 per cent of the redundancy is due to use of short words, 
and 2) the same rate of information transmission and the 
same mean word length to which we are accustomed could 
be achieved with an alphabet of only 6 symbols (5 letters 
and a space), if used at maximum efficiency. 


(13) 


GENERALIZATION 


The above relations are easily generalized to the case 
where the transmission time is different for different 
symbols, and where we have other types of constraints: 


2 Shannon, loc. cit., p. 26. See also C. HE. Shannon, ‘‘Predictior 
and Entropy of Printed English,” Bell Sys. Tech J., vol. 30, pp 
50-64; January, 1951. Here the estimated redundancy is in 
creased to about 75 per cent, from experiments in which humat 
subjects attempted to restore missing parts of English text. How 
ever, the ability to do this may depend on semantic as well as purely 
statistical factors; and in any event the only properties which coulc 
be utilized for encoding efficiently into a smaller alphabet, ar 
known frequencies. Estimates based on measured letter and wort 
frequencies remain not much greater then 50 per cent, the valu 
used above. Even these measurements suffer from fundamenta 
ambiguities, some of which were pointed out by G. A. Barnard 
“Statistical calculation of word entropies for four western lan 
guages,” IRE Trans. on Inrormation Tuerory, vol. IT-1, pp 
49-53; March, 1955. Fundamentally, of course, it is meaningless t 
say that there exists one and only one “‘true’”’ redundancy for Eng 
lish text; one can speak only of the redundancy corresponding t 
certain specified statistical information. 


or example, let the k’th letter of the alphabet have trans- 
ission time ¢,, and denote the space by the a’th letter. 
en in (1) we interpret J; as the total transmission time 
word w,, and its evaluation is again elementary, with 
e result 


Q-ra 
Y ee Se 
OS aa (14) 
ere 
| a-1 
Wy D2 


rom this the channel capacity and transmission rate 
nder semi-optimal conditions may be found. Constraints 
| the form that certain combinations of letters do not 
ecur may lead to involved mathematical problems, but 
ot to any new difficulties of principle. Shannon’s Theorem 
shows’ that one common type of constraint is included 
we generalize the partition function to a “partition 
-atrix,” operation at full channel capacity then occurring 
hen the greatest eigenvalue of this matrix is unity. 

Of course, the quantity ¢, above need not be interpreted 
s a time. It can equally well stand for the ‘cost,’ as 
easured on any basis, of transmitting the k’th symbol. 
he theory then gives us the method of transmitting 
hich is most economical with respect to this cost assign- 
ent; semi-optimal transmission minimizes the average 
yst per word for a given entropy per word. 


RELATION TO STATISTICAL MECHANICS 


The partition function (11), with the number 2 replaced 

vy e, is almost identical to the one arising in quantum 
satistical mechanics, describing a harmonic oscillator. 
he operating characteristic in Fig. 2, for the case a = 2, 
presents nothing more than an unconventional way of 
lotting the Einstein specific heat function of a harmonic 
scillator. 
Each of the soluble problems of statistical mechanics 
so provides the solution to a certain problem of infor- 
ation transmission under constraints, and the math- 
natical analogy may be set up in other ways than the 
ne indicated here. In the above type of analogy, noted 
epfore by Mandelbrot,’ the mean word length corre- 
yonds to the thermodynamic energy function, the param- 
er \ to the reciprocal temperature. Thus the transmission 
ite H corresponds, not to the thermodynamic entropy 
inction, but to the ratio (entropy) /(energy). 

It is interesting that such a fundamental notion as 
1annel capacity has no thermodynamic analog. In 
ermodynamics the absolute value of the entropy has 
) meaning; only entropy differences can be measured in 
‘periments. Consequently the condition that H is 
aximized, equivalent to the statement that the Helmholz 
ee energy function vanishes (A = H — TS = QO), corre- 
yonds to no condition which could be detected experi- 
entally. 
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Generalization of the above analysis to the case where 
the different words are no longer statistically independent 
is also straightforward, and corresponds to the transition 
in statistical mechanics from the Maxwell-Boltzman 
“molecular” viewpoint, to the Gibbsian “global” view- 
point." We mention two examples of the correspondence 
which then exists. 

The partition function of the linear Ising chain,’ with an 
easy generalization, provides also an explicit solution to 
the problem of encoding a message into binary digits, in 
a way which is optimal from the standpoint of a person 
who knows the digram frequencies of the source, but has 
no other statistical information. The corresponding 
solution for trigrams would be of considerable interest 
in connection with the theory of ferromagnetism. 

The two-dimensional Ising model of ferromagnetism,” 
the partition function of which was first obtained by 
Onsager, gives the solution to a problem in which a mes- 
sage in binary digits has strong correlations between ad- 
jacent symbols, and also between the n’th and the 
(n + M)’th, where M is a large fixed number. Its most 
striking feature is a logarithmic singularity, signifying 
physically a phase transition (ferromagnetic Curie point). 
Translated into communication theory, it can be said that 
at a certain critical strength of the intersymbol correla- 
tions, as measured by the parameter \, there occurs a 
sudden collapse of transmission rate to a very low value, 
dH /d\ becoming infinite at a single point.”' 


CoNCLUSION 


Much of what we have said has already been pointed 
out by others.°'® However, the basic mathematical 
identity of these two fields has had, thus far, very little 
influence on the development of either. There is an 
inevitable difference in detail, because the applications 
are so different; but we should at least develop a certain 
area of common language, so that a worker in one field 
can decide quickly whether work in the other has a bear- 
ing on his problems. 

We suggest that one way of doing this is to recognize 
that the partition function, for many decades the standard 
avenue through which calculations in statistical mechanics 
are “channeled,” is equally fundamental to communi- 
cation theory. Even within communication theory, there 
are advantages to be had by adopting this terminology 


13 G. F. Newell and E. W. Montroll, “On the theory ‘of the Ising 
model of ferromagnetism,” Rev. Mod. Phys., vol. 25, pp. 353-389; 
April, 1953. 

14 This type of message structure strongly resembles that occurr- 
ing in certain styles of music, where strong correlations appear 
after an interval of 2” bars, n being a small integer. This phenomenon 
of collapse in transmission rate then has some amusing implications, 
which we leave for the reader to develop. 

15 A referee kindly informs me that the following reference also 
contains material along the lines discussed here: Apostel, Mandel- 
brot, and Morf, “Linguistic statistique macroscopique”’ in “‘Logique, 
Langage et Theorie de L’information,”’ Presses Universitaires de 
France, Paris, France, pp. 1-78; 1957. 


102 


and notation as standard. For example, expressions of the 
form >> D~™', which occur repeatedly in coding theory, 
are really partition functions. The “rather algebraic” 
nature of this theory derives in part from the fact that 
often only one value of D is considered. If we generalize 
by setting D = 2*, with \ a continuously variable param- 
eter, we have a true partition function, which has 
analytical properties useful in deriving theorems; indeed, 
this is just what McMillan‘ has done. A partition function 
Z(A) is, of course, the same as a generating function of 
the variable ¢ = 2~*. However, from a general standpoint 
the partition function is a more powerful analytical tool 
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because it remains single-valued under conditions where 
the generating function would develop an infinite number 
of Riemann surfaces. 

The way in which the partition function varies for 


different values of \ often tells one the effect of some 
departure from ideal conditions. Thus, in the problem — 


treated above, we see from inspection of Fig. 2 that in 


the case of small alphabets it is essential to encode in- 


such a way that the mean word length is held close to 
the optimal value; while in a large alphabet the mean 
word length can vary widely with very little effect on 
attainable transmission rate. 


On the Use of Laguerre Polynomials in ‘Treating the 
Envelope and Phase Components of Narrow-Band 
Gaussian Noise’ 


IRVING 8S. REED 


Summary—tThe joint probability density of the envelope of a 
Gaussian process at two different times is expanded by the use 
of Hardy’s identity into a series involving Laguerre polynomials. 
It is shown how this result may be used to estimate the cross- 
correlation function of the output of two quite general envelope- 
distorting filters. A generalization of this result, involving the use 
of the associated Laguerre polynomials, is obtained and applied 
to the calculation of a cross-correlation function which involves 
both the phase and envelope of the process at two points in time. 


IF v(t) is a stationary narrow-band gaussian process 

with a power spectrum centered on the angular 

frequency w, then it is well known’ that 2(¢) can 
be represented by 


x(t) = R(t) cos wt + a(d)) 


where #(t) is the envelope and 6@(t) is the phase modulation. 
Price’ has shown that the autocorrelation function of 
x(t) is given by 


gAt) = Ex(t)x(t + 7) 


o p cos (wr + y) 


* Manuscript received by the PGIT, February 18, 1959. The 
work reported here was performed at the Lincoln Lab., a center 
for research operated by Mass. Inst. of Tech. with the joint support 
of the U. 8S. Army, Navy and Air Force. 

7 Staff member, Lincoln Lab., M I.T., Lexington, Mass. 

1S. O. Rice, “Mathematical analysis of random noise, “Bell 
Sys. Tech. J., vol. 24, sec. 3.7, p. 46; July, 1944. 

2 R. Price, “A note on the envelope and phase-modulator com- 
ponents of narrow-band gaussian noise’, IRE Trans. on INFORMA- 
TION TuHeEory, vol. IT-1, pp. 9-13; September, 1955. 


where FE is the expected value operator, o” is the rms 
noise power, 0 < p < 1 is the normalized envelope, and 
y is the phase of ¢,(r) as a function of 7. 
The joint probability density of R(4), R(t), O(t,) and 
6(t,) is given by' 
Teules 


Pig is, 6,, 4.) 72 dro (1 =~ BY 


A(ese oa \ Pp? 2 
exp ( 2Q0°(1 = p) {Ry == Rs 


— 2pk,R, cos (6. —  — »}) (1) 
where R, = R(t,), 0 = O(t,) for (k = 1, 2). Rice’ obtained 
the joint probability density of R, and R,, 


ph. Rs ;) 


Reales zB 
1ae = p) 


(lS) 


P(R,, Rs) = 5 


x lea lo 
a 2o0°(1 — 


from P(R,, R2, 6,, 62) by performing the integrations over 
6, and 6, where J, is the modified Bessel function of the 
first kind. By Hardy’s identity® 


8G. H. Hardy, “Summation of a series of polynomials of Lagu- 
ere,’ J. London Math. Soc., vol. 7, pp. 138-139; 1932. 
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the generalized Laguerre polynomial of order a, we 
pw many express P(A, R,) in a series of terms involving 
t Laguerre functions of zero order, L{°(u) = L,(u). 
jetting 
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and o = t 


h (2), we have 
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‘his expression is analogous to Cramer’s expansion’ of 
he two-dimensional joint gaussian distribution in terms 
f Hermite polynomials. Eq. (4) does not seem to have 
een observed before except by Levin’ whose formula 
; in error. Levin’s result has the associated Laguerre 
metion L'’?(x) in place of the proper Laguerre func- 
on L, (2). 

Suppose, now, that we have two envelope-distorting 
Iters, one with output G,(A(¢)) and the other with output 
'.(R(t)). We are interested in the cross correlation of one 
f these outputs with the other delayed by a time +. 
‘hat is, we wish to calculate 


12(7) = EG,{(R(4))G(R(t)) = HG,(R1)G2(R2) 
=f ane <i a J 
ih 


exp ea [Ri + re} 


G(R,)G(R,) dR, dk). (5) 


4H. Cramer, ‘“Mathematical Methods of Statistics,” Princeton 
Biveretty Press, Princeton, N. J., p. 133; 1946. 

5B. R. Levin, “The Theory of Noise Processes, ”” Moscow Press, 
loscow, U.S. SR. , p. 310; 1957, (in Russian). 
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Proceeding formally, we have by (4) 


“of Also a] 
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If we let 

lin, = oP Pe winel Jk, = oV 2y, 
then 

MG) = Lee (6) 

where 

oe / 6*L,(®)G,(o V/ 22) de, 

0 

and 


9? = / LAWGalo-V/2y) dy 


are the Fourier coefficients (in the general mathematical 
sense) of the Fourier expansions of G,(oV/2z) and 
G,(o V/2y), respectively, in terms of Laguerre polynomials. 

It is well known* that a function f(z) where 0 < 2 < © 
may be expanded in a Fourier series of Laguerre poly. 
nomials of the form 


EAC) 


where 


fa = [EWC le) a 
0 
if the integral 
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and exists in the Lebesgue sense. Thus if we assume that 
Gi(ovV/2x) and Go(o-V/ 2y) are such that 


E | Gilo-V2x) |? e* 


le | Gi(o-v/2y) |? 6? 
0 
[o) 2 
2) kee ee < @ 
0 20 oO 


6G. Szego, 
New York, N. Y., 


“Orthogonal Polynomials,’ American Math. Soe. 
chs. III and V; 1939. 
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and exist in the Lebesgue sense, we are assured that both 
functions have Fourier expansions in Laguerre poly- 
nomials which converge to the original functions in the 
mean-square sense. Moreover, condition (6) may be used 


c . i 0 5 A 6 
in conjunction with the identity 


a 
i| 


and the Schwarz inequality for integrals to justify the 
change of the order of summation and integration we 
used to derive (6). This justification obtains’ by showing 
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Evidently condition (7) is equivalent to the condition 
that the processes from the envelope-distorting filters 
have finite second moments, a condition which is generally 
true in practice. 

Before we consider examples, let us calculate a more 
general expression which will include (5) as a special case. 
We will calculate 
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where R,, R2, 6;, 62 are defined as in (1) and G, and G, 
are defined as in (5). The quantity in (8) can be regarded 
as the cross correlation of the complex process z,(¢) = 
G,(R(t)) e’’“? with conjugate of the process z(t) = 
G.(R(t)) e’“? but at the delay time 7. Evidently, by (1) 


rP@) 


= i | i | eae A, 6s) 
4 0 


ata ff acme 
fits.) = ne Ri +R; 


oo ie) 
4 a) | GeV 20)Gl0-V2y) 


[1.2 exp {Een | own dx dy 


l= ¢ 
7. C. Titchmarsh, “Theory of Functions,’ Oxford University 
Press, Cambridge, Eng., p. 45; 1932. 
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where L“”? (x) is the associated Laguerre polynomial 
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It is well known® that the coefficients g,"? for k = 1, 2 
are the Fourier coefficients of the econ G,(o V/ 22) 
when expanded in a Fourier series of associated Laguerre 
polynomials of the form 
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where k = 1, 2. Clearly, the change of the order of oper- 
ations needed to obtain (9) is justified if we again assume 
condition (7). Finally, formula (9) includes (6) as a 
special case since 
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for (k= 172). | 
Our principle example wili be based on the following — 
integral which will be proved in the Appendix, : 
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where Re a > O and where L°(t) is the generalized - 
Laguerre polynomial 
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vhere 2, (a, b; c; 2) is the hypergeometric function 
lefined by 
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When a = 8 = pv with » real, (11) may be shown to be 


quivalent to a result first obtained by Middleton® for 
he correlation function of the m’th harmonic of the output 


8D. Middleton, “Some general results in the theory of noise 
‘hrough nonlinear devices,” Quart. Appl. Math., vol. 5; 1948. 
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of a v’th law non-linear device into which narrow-band 
gaussian noise is fed. If a = 6 = 0, (11) is the m-th 
Fourier coefficient of the probability density for 6. — 61, 
first obtained by MacDonald.” If a = r, 8 = s and m = 0 
where r and s are positive integers, then (11) is the 
formula for the moments of joint envelope distribution 
(2). Finally, it is of some interest to consider a special 
case of (11) when a and 6 are complex. If a = — 76 and 
6 = 26 where 6 is real and m = O, then (11) reduces to 


. 2 : 5 
Ee®? log R2/Ri = | r(1 + 2) (2 aa 16 « il : ’). (12) 


This is a cross-correlation function of two functions of 
the envelope R(t) at the respective times ¢, and ¢, which is 
independent of the noise power o. Moreover, we have 
calculated, incidentally, the characteristic function of 
the random variable log R,/R,. Since the right side of 
(12) is real and positive, we have 


EH (log Rahe == OQ, 


and 


E (log R,/R,)”" 
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as the moments of the random variable log R,/R,. 


APPENDIX 


The integral in (10) is a special case of the formula 


of 
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to obtain formula (10). Since L(t) is a polynomial, it is 
easy to see that the integral exists when Re a > 0. 


9P, K. C. MacDonald, ‘Some statistical properties of random 
noise,’’ Proc. Camb. Phil. Soc., vol. 45; 1949. 

10.6: Truesdall, ‘A Unified Theory of Special Functions,’ 
Princeton University Press, Princeton, N. J., p. 112; 1948. 

11. T. Copson, ‘Theory of Functions of a Complex Variable,” 
Oxford University Press, Cambridge, Eng., Chapter X; 1935. 
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Interchannel Correlation in a Bank of Parallel Filters’ 


J. GALEJSt anp W. COWANT 


Summary—tThe first-order effects of interchannel noise correla- 
tion on the fasle alarm and incorrect dismissal probabilities are 
computed for a bank of parallel RLC filters by expanding the en- 
velope distribution of the n filter outputs in a power series. 

The correction to the false alarm probability due to noise correla- 
tion is found to decrease with increasing threshold-to-rms-noise 
ratios. If the filter-separation-to-filter-bandwidth ratios are larger 
than 0.2, it is less than 15 and 0.2 per cent for threshold-to-rms- 
noise ratios exceeding 12 and 14 db respectively. 

The correction to the incorrect dismissal probability, which is 
computed by considering the signal output of three contiguous 
filters, increases with increasing threshold-to-rms-noise and signal- 
to-threshold ratios. Even for filter separations larger than the 
filter bandwidth, it may be in excess of 100 per cent if the threshold- 
to-rms-noise ratio exceeds 12 db and the signal-to-threshold ratio 
is larger than 1.2. 


INTRODUCTION 


ANY existing radar systems use banks of parallel 
filters prior to signal detection. Such filters act 
as coherent integrators or IF doppler filters. 

The determination of the false alarm or incorrect dismissal 
probabilities, which may be regarded as absolute criteria 
of system performance in a prescribed signal and noise 
environment, is complicated by the fact that noise out- 
puts of the different filters are partially correlated. 
Because of the partial overlap of the filter pass bands, 
the noise outputs of adjacent filters will have a certain 
degree of similarity which affects the statistics of the 
filter outputs. In the simplest computational procedure, 
the partial noise correlation of the filter outputs is ignored 
and the envelope distribution of the outputs of 7 filters 
is computed as the product of the envelope distributions 
of the individual filters. If the partial noise correlation is 
taken into account, the envelope distribution of the n 
filter output is described by an n-fold integral, which 
cannot be evaluated in closed form. If the integrand is 
factorized after a power series expansion, the evaluation 
of this integral reduces to the evaluation of a product of 
n single integrals. The first-order effect of correlation 
between the filter outputs can be obtained by considering 
the first non-zero terms in this power series expansion. 
Such a procedure has been applied for computing the 
false alarm rates of idealized coherent integrators, which 
integrate with uniform weighting the envelopes of the 
two phase-quadrature noise components.’ However, no 
numerical results showing the magnitude of the inter- 
channel correlation effects were given except for a state- 


* Manuscript received by the PGIT, February 4, 1959. 

7 Appl. Res. Lab., Sylvania Electric Products Inec., Waltham, 
Mass. 

1K. S. Miller and R. I. Bernstein, “An analysis of coherent 
integration and its application to signal detection,’ IRE Trans. 
on InrormaTIoNn THEORY, vol. IT-3, pp. 237-248; December, 1957. 


ment that these effects are small.’ A similar computational | 
procedure can be used to approximate the amplitude 
distribution in the outputs of n parallel RLC filters if 
noise or noise plus signal are applied to the filter inputs. 
A series expansion in powers of correlation coefficients 
results in a factorized distribution that can be readily 
integrated to compute the false alarm and incorrect dis- 
missal probabilities. 


THe APPROXIMATE ENVELOPE DISTRIBUTION 
oF n FILTER OvuTPUTS 


The envelope distribution of the filter-outputs is com- 
puted using techniques already described.*'*'* Therefore, 
only a brief outline of the mathematical development 
will be included in this paper. 

The system under consideration consists of n parallel 
RLC filters, each followed by an envelope detector and a 
threshold device as indicated in Fig. 1. The filters have a 
half-power bandwidth w, and are tuned to frequencies 
w;. The noise voltage output of a single filter which has | 
been split into in-phase and phase-quadrature components 
can be represented as 


V(t) = V2.0) Cos wt + V;(0) sin w,t. 
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Fig. 1—Block diagram of the system. 
Assuming that the noise components have zero mean 
Gaussian amplitude distributions, the averages of the out- 


put components of two RLC filters tuned to frequencies 
w; and w, are 
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2 Op. Cit. Ds 240. 

3§. O. Rice, “Mathematical analysis of random noise,’ Bell 
Sys. Tech. J., vol. 23, pp. 282-332; July, 1944; and Bell Sys. Tech. J., 
vol. 24, pp. 46-156; January, 1945. 

*D. Middleton, “Some general results in the theory of noise 
through nonlinear devices,’ Quart. Appl. Math., vol. 5, pp. 445- 
498; January, 1948, 
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he averages p,, and u;, represent the individual elements 
the correlation matrix of a 2n-dimensional Gaussian 
stribution that characterizes V.; and V,; of n filter 
utputs..'°" 
In the absence of a signal, the envelope of a filter output 
n_be computed as the absolute magnitude of V;(¢) in 
1). If a signal of amplitude A and of frequency w, is 
pplied to the filter bank, signal components of amplitude 
; will appear in the j’th filter. The amplitude A; depends 
n the frequency separation 


Wj 


(4) 


‘he signal components A, modify both the in-phase and 
jhase-quadrature components of the filter output. The 
‘th filter output can also be described by an amplitude 
rp; and phase angle 6;. The variables R; and 6; are thus 
nections of V,;, V.;, and of A;. Alternately, the noise 
utput components V.; and V,; that are characterized 
y a 2n-dimensional Gaussian distribution can be related 
A;, R;, 6;. By substituting A,;, R;, and 6; in the appropri- 
te functional relations for V.; and V,; in the 2n-di- 
| ata Gaussian distribution and using the required 
acobian, one obtains a probability distribution W(R,, 
i, °°: , &,, 0,) of the variables R,, 01, --- , Ra, &,. The 
oint envelope distribution of the n filter outputs R,, 
++, R, is given by the n-fold integral 
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_ The complexity of the W function prohibits a direct 
evaluation of this integral. However, one can expand 
(1, -++, R,) in powers of 


Os = OY ce Sis Dik , (6 


which reduces products like f,(0;) f2(6@,) to sums h,(@;) + 
h2(,) in the exponents of the W function in (5). As a 
sonsequence, the n-fold integral (5) reduces to a product 
of n single integrals, each between the limits of 0 to 2r. 
Designating the envelope distribution of n outputs with 
ull Jix’S set equal to zero by p(0), (5) can be expanded as* 


anh) () ane COL re 


Te sO) COs 0a (5) 


Ss 


p(R,,-°: ,R,) = pO) + 2G, ~ (PO) Io. 
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The derivatives of p(0) are computed by setting all g;,’s 
except the ones to be differentiated equal to zero before 
the differentiation. If the series expansion is terminated 
with the quadratic terms, the variables that are related 
by the non-zero g,,’s are characterized by at most a 6 by 
§ correlation matrix.’ For a single signal appearing in the 
zenter of the 2’th channel, (7) reduces to 


5 If 2,k,7,s have 3 distinct values, there is a 6 X 6 correlation 
matrix. If 7,k,7,s have 4 distinct values, there are two 4 X 4 matrices. 
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and where J, is the modified Bessel function of the first 
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kind and order n. The symbols r and m are given by 


r=2i—k (12) 
and 

n ifz > 0.5 + 1), 
‘ —1lifi<0.5(n+ 1). 


The variable 7 depends on 7 and k. E of (11) is therefore 
characterized by only two independent parameters, 7 
and k. 


(13) 


Fausp ALARM PROBABILITY 


The probability of a false alarm @ in the output of a 
bank of 7 filters is equal to the probability that at least 
one filter output exceeds a prescribed threshold R, in 
the absence of signal. Thus 
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where the probability distribution of the filter output 
envelopes p(R,, Ra, - R,) is computed with all the 
signal amplitudes A; set equal to zero. A; = O results 
in zero E;,. D;, reduces to’ 
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The false alarm probabilities obtained by considering 
or by ignoring the interchannel correlation effects are 
denoted by aeorr OF DY po corr Tespectively. Evaluation 
of (14) shows that 


0.252" exp (—2)[1 — exp (—0.5z)]"”” 


Ro 
| PRRs oR, ah aOR) 


Di. = 


(15) 


&nocorr ~ Qeorr __ 
Ono corr - = [1 7 (2.40) (—0.52) |” 
=< nt 
, =a 1 + vy 5) (16) 
where 
_ Ro 
QS = 17 
; (17 
and 
y = Aw = (War are! wi) (18) 
Wa Wa 


The changes in false alarm probability due to noise 
correlation are computed numerically for different 
threshold-squared-to-mean-square-noise ratios «, filter- 
separation-to-filter-bandwidth ratios y, and numbers of 
filters n. The upper limit of the summation over 7 was 
varied for » = constant in the computations leading to 
the curves of Fig. 2 in order to show the effect of noise 
correlation between increasingly more distant filters. The 
upper limit of the summation was equal to (n — 1) in 
the plots of Figs. 3-7; these plots take noise correlation 
between all » filters into account. 
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k = MAXIMUM NUMBER OF CHANNELS 
OVER WHICH NOISE CORRELATION 
IS CONSIDERED 


3 
t 


y= OY £1). CONSTANT 
Wg 


n= NUMBER OF CHANNELS =100 


CHANGE IN FALSE ALARM PROBABILITY -(a,¢-@¢) /ang 
3 


8 10 12 14 16 18 20 
THRESHOLD TO RMS NOISE RATIO Ro/V py -db 


Fig. 2—Change in false alarm rate with the number of channels 
over which noise correlation is considered. 
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8 10 l2 14 16 18 
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Fig. 3—Change in false alarm rates for different amounts of channel 
overlap, n = 20 
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Fig. 4—Change in false alarm rates for different amounts of channel Fig. 6—Change in false alarm rate with different number of filters 
overlap, » = 100. of constant bandwidth and constant filter separation. 
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ig. 5—Change in false alarm rates for different amounts of channel Fig.7—Change in false alarm rate with different number of filters 
overlap, n = 500. : of constant bandwidth covering the same frequency band. 
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PROBABILITY OF INCORRECT DISMISSAL 


The probability of incorrect dismissal at a prescribed 
threshold Ry is equal to the probability that all of the 
filter outputs remain below a threshold Ro, if a signal of 
amplitude A is applied to the filter bank. Thus 


ie 
v0 


The value of 6 obtained by substituting p(R,, --- R,) 
of (8) into (19) is designated as 8,.,,. Substituting the 
joint probability distribution with neglected correlation 
effects into (19), one obtains the incorrect dismissal prob- 
ability Bao corr- The ratio of these two incorrect dismissal 


probabilities is clearly 
Ro Ro 
| | paarear, 


Ro 
Oe [p++ Ry) dR +++ dR, (19) 
0 


2 
s—1 9rs 


ay, 


Pet 1 ty 


[Bic corr r=1 K (Ry) K,(Ro) 
Ro Ro Ro 
: / [ / Ej, dR; dR, dR, 
a 0 0 0 . 20 
+ Dy ie K (Ry (Kia F (20) 


where g,., D,., Eszr, 7, and m are given by (6), (10), (11), 
(12), and (13) respectively, and where 
Ro 

K (Re) = | pil) aR. (21) 
The above integrals and the summations have been 
evaluated numerically by means of a digital computer. 
The plots in Figs. 8-17 show the change in the incorrect 
dismissal probability for different threshold-squared- 
mean-square-noise ratios 2,  filter-separation-to-filter- 
bandwidth ratios y, signal-to-threshold ratios (¢ = 
A/R,), and number of filters n. 


Discussion 


False Alarm Probabilities 


The first-order correction to the false alarm probability 
a is computed by evaluating (16) for different threshold- 
to-rms-noise ratios (Ro/V/y = Wz), filter-separation-to- 
bandwidth ratios (Aw/ws = y), and numbers of filters n. 

The upper limit of the summation in (16) determines 
the number of adjacent channels k, over which noise 
correlation is considered. With the limit equal to (n — 1), 
noise correlation over all n filters is taken into account. 
If the summation is approximated by the 7 = 1 term, only 
the effects of noise correlation between adjacent filters 
are considered. It is shown in Fig. 2 that for Aw = wg 
noise correlation over a few filter channels (k = 5) gives 
almost the same correction to the false alarm probability 
a as noise correlation over all possible filter pairs (k = n). 

Figs. 3 to 5 show the correction to the false alarm 
probability a@ with increasing threshold-to-rms-noise 
ratios R,/Wyw at different filter separations y. The 
number of filters nm is constant in each figure. Increased 
filter separations y decrease the noise correlation between 
adjacent channels, which decreases the a correction. The 
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POSITIVE VALUES 
-—-— NEGATIVE VALUES 


CHANGE IN INCORRECT DISMISSAL PROBABILITY -(B.-Byc)/Bae 


Fig. 8—Change in incorrect dismissal probability for different signal- 
to-threshold ratios, 3 filters, signal output only in the center one. 
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CHANGE IN INCORRECT DISMISSAL PROBABILITY-( Bo -Bnc)/Bne 


8 10 12 14 16 18 20 
THRESHOLD TO RMS NOISE RATIO Ro/V yw -db 


Fig. 9—Change in incorrect dismissal probability for different 
Threshold-to-rms-noise ratios, 3 filters, signal output only in the 
center one, filter separation y = 1. 
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Fig. 10—Change in incorrect dismissal probability for different Fig, 12—Change in incorrect dismissal probability for different 
_ threshold-to-rms-noise ratios, 5 filters, signal output only in the threshold-to-rms-noise ratios, signal output in all 3 filters, filter 
center one, filter separation y = 1. separation y = 0.5. 
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‘ig. 11—Change in incorrect dismissal probability for different Fig. 13—Change in incorrect dismissal probability. for different 
ir sshield-tyma-ndise ratios, signal output in all 3 filters, filter threshold-to-rms-noise ratios, signal output in all 3 filters, filter 
separation y = 0.2. separation y = 1. 
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Fig. 16—Change in incorrect dismissal probability for different 
threshold-to-rms-noise ratios, 5 filters, signal output in the 3 
center ones, filter separation y = 1. 


Fig. 14—Change in incorrect dismissal probability for different 
threshold-to-rms-noise ratios, signal output in all 3 filters, filter 
separation y = 2, 
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Fig. 15—Change in incorrect dismissal probability for different Fig. 17—Change in incorrect dismissal probability for different 
threshold-to-rms-noise ratios, 5 filters, signal output in the 3 threshold-to-rms-noise ratios, 5 filters, signal output in the 3 
center ones, filter separation y = 0.5. center ones, filter separation y = 2. 
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orrection to the false alarm probability a is small for 

rge threshold-to-rms-noise ratios. It is less than 0.2 
er cent for Ro/ Vy = 14 db, but it can be as high as 15 
er cent for R,o/Vy = 12 db. Increasing the number of 
ilters n decreases the maximum value of the a correction 
hile shifting it to slightly higher values of Ro/VW/y. 
This maximum may exceed unity for n = 20, but it is 
lecreased to less than 25 per cent for n = 500. The maxi- 
mum occurs for n = 50 below Ro/-Vy = 9 db, and for 
= 500 at Ro/ Vw = 11 db. 

Similar relations between the correction to the false 
ularm probability a and the number of filters n can be 
een from Fig. 6, which is obtained by replotting the 

= 1 curves of Figs. 3-5. Increasing » represents here 
an increased over-all filter bandwidth if the bandwidth 

a and separation Aw of the individual filters is kept 
onstant. 

The n dependency of the curves shown in Figs. 3-6 can 
e explained qualitatively by considering the false alarm 
robability a for small threshold-to-rms-noise ratios 
R./ Vw. The probability a for a single channel is nearly 
unity for sufficiently small R,/Vy ratios. For n ¥ 1, @ 
is increased but remains less than unity regardless of 
noise correlation in the channels. This accounts for the 
Hemel: (6 corr —- @eorr)/ Gao corr Values at small Roof Vw 
ratios. With increasing Ro/ Vv y, a 1s decreased but it 
remains closer to unity if n is large. Therefore, the possible 
x correction is decreased for n large. 

When a constant frequency band is covered with an 
ancreasing number of filters n, the filter separation Aw is 
decreased and the filter overlap increased if the filter 
bandwidth w; remains constant. The maxima of the a 
correction curves in Fig. 7 still shift to higher R,/ Vy 
ratios with increasing n but the magnitude of the a correc- 
tion is increased with increasing n. This is an obvious 
consequence of the filter overlap which causes an increased 
noise correlation in adjacent filter channels. 


Incorrect Dismissal Probabilities 


The first-order correction to the incorrect dismissal 
probability 6 is computed by evaluating (20) for different 
threshold-to-rms-noise ratios Ro/Vy = Wz, filter- 
separation-to-filter-bandwidth ratios Aw/w, = y, signal- 
to-threshold ratios A/Ry = z, and numbers of filters n. 

The first set of computations assumes the signal to be 
in the center channel of three or five parallel filters. The 
signal output of all other filters is neglected. Such an 
idealization may be prompted by an attempt to simplify 
the rather complex numerical calculations. 

Fig. 8 shows that the correction to the incorrect dis- 
missal probability, 


Boots 


Bac corr 


= Bao corr 
d 


is positive for small signal amplitudes A. For very small 
signal amplitudes A, 


Brl-ea (22) 
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as seen from (16) and (21). This gives 
ise = (bk corr i} Ano corr =, Qeorr 23 
(ope corr ie i Qno corr ( ) 


which is positive according to the a curves discussed 
earlier. The 6 correction in Fig. 8 reverses sign and becomes 
negative for larger A/R, ratios. The total 8 correction 
will be (— 1) for A/R, exceeding 2 and fully correlated 
noise in the filter channels. This is evident from the vector 
diagram in Fig. 18. For full noise correlation the noise 
vectors in the signal-plus-noise and in the noise-only 
channels are identical. Whenever the signal-plus-noise 
tends to be within the circle of radius Ro, the noise will 
be outside the circle. The probability 8...,, which is the 
probability that both signal-plus-noise and noilse-only 
give an output of less than Ro, is equal to zero. Thus, the 
correction (Bsors —" Bas cort)/ ac corr 1S eCCUAI NOM Cay) 
whenever A > 2Rp. 


Fig. 18—Vector diagram of signal-plus-noise and noise-only channel 
for full noise correlation in both channels. 


The correction to the incorrect dismissal probability 
6 is shown in Figs. 9 and 10 for varying threshold-to- 
rms-noise ratios Rp/ Vy and for different z and n values. 
Increasing 2 and n increases the amount of the required 
negative 8 correction. 

The 8 corrections are recomputed for n = 3 and the 
signal in the center channel without neglecting the signal 
output of the two side channels. The curves plotted in 
Fig. 11 to 14 indicate that the 6 corrections increase with 
R./ Vy contrary to the observations made in earlier 
figures. Also, first order corrections in excess of 100 times 
are noted for the larger A/Ry and Ro/V y ratios. 

This can be readily explained by considering the limit 
of fully correlated noise in several filter channels. Fully 
correlated noise is obtained in completely overlapping 
filter channels. The signals in the output of such channels 
are identical. The incorrect dismissal probability 8 of any 
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number of such channels is equal to the probability By of a 
single channel. In n channels containing uncorrelated 
noise and identical signals, the incorrect dismissal prob- 
ability is 85. The 6 correction becomes herewith 


Becwr Ra Bas corr __ 1 oa see 
— =i . 
Oita corr Bo" 


The probability @) gets small for (A — R,)/ Vw large, 
which is the case for large A/Ry and R,/ Vy ratios; the 
8 correction reduces to 


(Cee (Bix corr lm; 
t 


1 —l 
Bars corr Bo" ; 


which increases with decreasing 8) or with increasing 
A/R, and Ro/ VW ratios. 

The curves plotted in Figs. 11-14 decrease with in- 
creasing filter separations y, but 6 corrections in excess of 
100 per cent are possible even for Aw > w, with R/V y > 
122db andTAyine = 1.2; 

Vive parallel filters with the signal located in the center 
channel are assumed in the computations leading to the 
plots of Figs. 15-17. Signal output of only the three 
center channels is considered; the signal output of the two 
outer channels is neglected. The n 5 curves in Figs. 


(24) 


(25) 


= 95 


15-17 exhibit approximately the same large Ro/V yw 
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values as the n 3 curves in Figs. 11-14. For smaller 
Ro/Vw values the n = 5 curves exhibit either larger 
positive or larger negative values, with a sign change 
occurring at approximately the same R,/ Vw values as 
for the n = 3 curves. 

Comparison of Figs. 9 and 10 with Figs. 11-17 indicates 
that neglecting the signal output of the filters that are 
off the signal frequency results in 6 correction factors of 
erroneous R,/~/~ dependency and even of wrong signs. 
Although the reference to Figs. 9 and 10 may be of little 
value in connection with practical decision problems, they 
are still included in this paper for illustration purposes. 


Limitations of the Present Work 


The curves plotted in Figs. 2-17 represent the first- 
order corrections to the false alarm and incorrect dismissal 
probabilities due to noise correlation in the different filter 
channels. This first-order correction can be expected to 
approximate the total correction only if the power series 
is rapidly convergent as in the case of small values for the 
first-order correction. The first-order terms may give a 
non-accurate indication of the total correction whenever 
the first-order correction is close to or even above unity. 
The calculation of higher-order corrections has not been 
attempted in this paper. 


Application of Modular Sequential Circuits to Single 
Error-Correcting P-Nary Codes’ 


T. E. STERN{ anp B. FRIEDLAND{ 


I. InrRoDUCTION 


T is the purpose of this paper to present systematic 
methods for the construction of close-packed single 
error-correcting multi-level codes, coders and 

decoders. The emphasis in this paper will be laid on simple 
and efficient means of constructing the coders and decoders 
using linear modular sequential circuits.’ 

Error correcting and detecting codes for binary channels 


* Manuscript received by the PGIT, March 11, 1959. This work 
was supported in part by the Department of the Navy, under Con- 
tract No. NONR 266(60). 

+ Department of Electrical Engineering, Columbia University, 
New York, N. Y. 

1 B, Friedland, ‘‘Linear modular sequential circuits,’ IRE Trans. 
on Circuit Tueory, vol. CT-6, pp. 61-68; March, 1959. 

2B, Friedland and T. E. Stern, ‘‘On Periodicity of States in 
Linear Modular Sequential Circuits,” this issue, pp. 136-137. 


have been developed by Hamming,° Slepian,* Huffman,°”* 
and others. The work of Huffman is of special importance 
here because of his use of linear sequential filters’ for 
coding and decoding. The modular sequential circuits to 
be described in this work are generalizations of Huffman’s 
filters. Multi-level codes have been discussed by Golay’ 


°R. W. Hamming, “Error detecting and error correcting codes,” 
Bell Sys. Tech. J., vol. 29, pp. 147-160; April, 1950. 

4D. Slepian, “A class of binary signaling alphabets,’’ Bell Sys. 
Tech. J., vol. 35, pp. 203-234; January, 1956. 
_ ®D. A. Huffman, “A linear circuit viewpoint on error-correct- 
ing codes,” IRE Trans. on Iyrormation Tueory, vol. IT-2, pp. 
20-28; September, 1956. 

®D. A. Huffman, “The synthesis of linear sequential coding 
networks,’’ “Information Theory,” C. Cherry, ed., Academic Press, 
Inc., New York, N. Y., pp. 77-95; 1956. 

7™M. J. E. Golay, ‘Notes on digital coding,’ Proc. IRE, vol. 
37, p. 637; June, 1949. 
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and Ulrich.” However, little attention has been given to 
the problem of finding simple means of realizing coders 
and decoders in the multi-level case. 


II. Crosz-PAckEpD SINGLE ERROR-CORRECTING 
P-Nary Coprs 


Consider a block of / symbols, each symbol having p 
possible levels (say, the integers, 0, 1, De ae Le 
Assume that only single errors can occur in this block, 
and that each error may occur in w possible ways 
(w < p — 1). Further, assume that each block contains k 
check digits, the remainder being arbitrary message digits. 
In a received and decoded message block, the condition 
“all check digits zero,” is to correspond to a transmission 
without error; other combinations of values of the check 
digits are to correspond uniquely to the wl ways in which 

a single error can occur. Since there are p” — 1 non-zero 
combinations of check digits, we have 


wl < p* — 1 (1) 

or 

eae 
w 


l< (2) 
If condition (2) is fulfilled with the equals sign, we have a 
one-one correspondence between each possible single 
error and each possible non-zero combination of check 
digits. A close-packed systematic code will be defined as 
one in which condition (2) is fulfilled with the equals 
sign. 

In the special case in which any symbol may be trans- 


formed to any other symbol by an error in transmission, 


. 


we have for a close-packed code, 


jes p — 1 ( = an integer for all integral 
~ 9 —1- values of p and k). 


Clearly, for a given number of levels p there is a re- 
stricted set of block lengths which result in close-packed 


(3) 


codes. (See Table I.) Of course, in a practical case, a 


block length only slightly shorter than the close-packed 
length would still result in a fairly efficient code. For 
reasons stated in Friedland,’ only codes containing a 
prime number of levels are considered in this paper. 
| (Binary codes, of course, are a special case.) 


TABLE 1 
VaLuEs or 1 = p* — 1/p — 1 


PN aeP 
OS 2 3 5 7 11 
ON 
il iL | 1 1 I 1 
2 3 4 6 8 12 
3 ik 13 31 57 133 
4 15 40 156 400 1464 


8W. Ulrich, “Non-binary error correction codes,” Bell Sys. 
Tech. J., vol. 36, pp. 1341-1388; November, 1957. 
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III. Lingzar Mopubar SEQUENTIAL CIRCUITS 


Since the presentation which follows utilizes the device 
known as a linear modular sequential circuit (LMSC), a 
brief summary of the properties of this device is included 
here. For a more complete discussion of LMSC’s see 
Friedland and Stern.’ 

A linear modular sequential circuit, modulo p, a prime, 
is a system comprising unit delays, amplifiers whose gains 
are integers < p, and summers, mod p. Such circuits can 
be treated by more or less direct extensions of most of the 
tools of linear circuit theory. 

Any LMSC, with one input a(7) and one output y(n), 
can be characterized by a difference equation 


yn +k) +ayn +k — 1) +--+ + cxy(n) 
= Boxr(n + k) + Barn +k — 1) + -:- + Ban) (4) 


over the modular field GF(p). The modular field GF (p) 
consists of the integers 0, 1, --- , p — 1 together with the 
operations - and +, modulo p. If the difference equation 
is of order k, the LMSC is said to be of order k. Following 
Friedland,, we may, by the use of the Z-transform (or 
similarly, by the use of the delay operator), derive a 
transfer function for the system of (4) in the form 


= Boe’ + Bye nm Slee 
2 + a2" + wee SP Oh ; 


The response of this system to a unit input (“unit 
impulse response’”’)*® can be calculated from (5) in a 
number of ways, e.g., long division. For a given H(z), we 
may also construct an inverse circuit, H~*(z), such that 
the transfer function of the cascade combination is unity 
(See Fig. 1). Two circuits are mutually inverse if their 
transfer functions are reciprocals. (An inverse of a given 
circuit is not always realizable, however.) 


| pe mee +1 -- 


H(z) (5) 


O7AV a -s ae 

P and @ are polynomials in z 
| eo PGLC!) 
tas Pi(z) 


Fig. 1—Inverse filters. 


From (4) or (5), we may construct a canonical flow 
graph representation of the circuit as shown in Fig. 2. 
Note that the feedback gains correspond to the coefficients 
of the denominator of the transfer function. Expressions 


9 It is convenient, but by no means necessary, to interpret the 
input and output time functions as (equally spaced) trains of im- 
pulses having magnitudes (“areas”) equal to 1, 2, --- p — 1. 


Fig. 2—Flow graph representation of LMSC. 


for the b; are 
bo = Bo 
b; + a1bo = Bi 
bs + ab, + asbo = Bo 


b,. mE Dy a; dy; = B,. (6) 


IV. Tuer Copine AND DECODING SYSTEM 


Following the method of Huffman,’ we may construct 
a coding and decoding system using two LMSC’s as 
shown in Fig. 3. The system consists of a coding filter, 
H(z); the noisy channel, the effect, of which is represented 
by a noise sequence N introduced at a summing point; 
and a decoding filter H~*(z), which is the inverse of the 
coding filter. A decision mechanism operates on the out- 
put of H~*(z) to restore the distorted received message 
to its original form. A feature of this system is that the 
decision operation is extremely simple, as we shall subse- 
quently demonstrate. 

Operation of the system is as follows [it will be assumed 
in this section that w = p — 1, so that 1 = (p* — 1)/ 
(2 =a OE 

A message block of length / consisting of (1 — k) arbi- 
trary message digits followed by k check digits (all zeros) 
constitutes the input sequence X to the coding filter. The 
output of the coding filter is, therefore, 


U = HX. 


[In the sequel, it is understood that capital letters, e.g., 
H(z), H, will refer to Z-transforms of sequences, and lower 
case letters, e.g., h(n), h, to the sequences themselves. ] 

The distorted message which is received by the decoder 
may be represented as U’ = HX + N, where N is a 
sequence of length / whose 7’th digit equals the magnitude 
of the error which has occurred in the 7’th place of U. 
For example, the noise sequence NV for a message whose 
fourth digit changed from 2 to zero, mod 5, would be 
00030000-::- 

At the receiver, the distorted message U’ is first operated 
on by the decoding filter to produce an output 
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Fig. 3—Coding and decoding channel. 


X =H U =H (AX. NX eee (7) 


Eq. (7) indicates that X’, the output of the decoding 
filter, consists of a digit-by-digit sum of the original 
message X and the response of the decoding filter to the 
noise impulse. (Only single noise impulses are considered.) 
Clearly, then, for no errors the message will be reproduced 
correctly at the output of the decoding filter. Otherwise, 
it will be necessary to subtract H~’N from X’ to obtain 
the correct message. It is the function of the decision 
mechanism to determine the form of H~'N, and to sub- 
tract this from X’. 

The determination of H~'N is accomplished in the 
following manner. First, we observe that since the last k 
digits of X were originally zero, then from (7) the last k 
digits of X’ must consist only of H-’N without any 
contribution from X. Assume that N corresponds to a 
single error of magnitude gq occurring at time n. Then, by 
linearity, the sequence H~‘N is simply q times the response 
of the filter H~* to a unit impulse occurring at time n. 
Therefore, the last k digits of x’ will always consist of a 
subsequence of k& digits of the unit impulse response of 
H™ or a constant multiple thereof. (Of course, in the case 
of error impulse occurring in one of the last k places, say 
the (J — r)th place, where r < k, the last k digits of x’ 
will consist of r + 1 digits of the impulse response or its 
multiple, preceded by k — r — 1 zeros.) 

Now, let us assume that by observing the subsequence 
contained in the last k digits of x’ we can uniquely extra- 
polate backwards in time to determine the complete 
form of H~'N. If this is possible for every subsequence 
which might occur, then we have solved the decoding 
problem. It is not difficult to deduce that such a unique 
extrapolation will always be possible if h~*(n) has the 
following two properties: 


A) The sequence consisting of the first 1 digits 
of h-'(n) preceded by k — 1 zeros must comprise 
1 subsequences of length k, each of which must be 
distinct. 

B) None of these subsequences may be a con- 
stant multiple (mod p) of any other. 


(8) 


If these two conditions hold, then the set of all sub- 
sequences and their multiples, as defined above, will 
correspond in a 1 — 1 fashion to the p” — 1 ways in which 
a single error can occur, and therefore, the required unique 
extrapolation will be possible. The fundamental problem 
in the construction of the coding and decoding system is 
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the synthesis of a decoding filter having properties (8). 
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we may deduce that an error of three has occurred in the 


This problem and its solution will be discussed in section 
VI . First, let us consider the following: 


Example A 
| Let p = 5,k = 2, thenl = 6. 


From the theory of sec. VI, it is found that an appropri- 
ate set of filters is given by 
<4 a 
ae 2+4e42 


2 + 42-92 


2 


I 


H(z) 


I 


It can be verified that 

h@) =21 4.24 0, 
22343 0, 
ARAL WS Fhe) 
oo 2 Lf 2.0% 


Observe that h~* is periodic (the semi-colon denotes the 
end of the period of length 24), and that the period can be 
subdivided into four subsequences of length /, each of 
which is a constant multiple, mod 5, of the first. Inspection 
of the first six digits of h~’ (preceded by a zero) indicates 
that properties (8) are satisfied. 

Consider a typical message sequence 


122-37 1/050. 


l 


x 
‘Then 
i. i Bo 2. 


U 
Let the noise sequence be 
m==70-3-0 0-00. 
Then 
=143202 
and 
Tiel Oe oes 2. 


| (These relations can be verified by standard transform 
techniques.) 


| Remark 


A property of this system that can be observed from 
this example is that the information digits as well as the 
‘check digits of x become altered in passing through the 
coder. 
To identify H™’N and RNID backwards, we 
examine the last two digits of 2’. We note that this pair, 
1 2, appears in the 4th and 5th places of the last sub- 
sequence of h~*. But the last subsequence is simply three 
times the first, Cone pondine to the first six digits of the 
response of H~* to an impulse of magnitude three. Thus, 


second position of x. Hence 
PNP =20b35 oles 
Subtracting this from X’, we obtain 
Opto la2 
013 352) le ommeriodaas 
=123100 


Before proceeding, let us examine the properties of the 
filter chosen for example A 


1) h-*(n) is periodic with period p* — 1. 

2) H(z) exists and is realizable. 

3) H~*(z) and H(z) are ratios of polynomials in z 
of degree not exceeding k. 


(9) 


It will be shown in section VI that properties (9) imply 
properties (8), and imply, in addition, that h~’ consists 
of p — 1 disjoint and distinct subsequences of length 
(p" — 1)/(p — 1), each one of which is a multiple of the 
first. 

We now summarize the steps in the construction of 
close-packed single error-correcting codes, coders and 
decoders (assume p a prime, and w = p — 1): 

1) Select a block length / and a number of check digits 
k such that 1 = (p* — 1)/(p — 1). 

2) Select a pair of coding and decoding filters possess- 
ing properties (9). 

3) Construct each message x(n) in the form of a block 
consisting of 1 — k information digits followed by k zeros. 

4) Obtain the coded message u(n) as the response of 
the coding filter H to x(n). 

5) Obtain the decoded noisy message «x’(n) as the 
response of the decoding filter H~* to u’(n). If the last k 
digits are zeros, the message is correct. If not, proceed 
to step 6. 

6) Write out the impulse response h~*(n) with commas 
separating the subsequences. Shift zx’(n) over h™‘(n), 
until the last k digits of «’(n) match a corresponding set 
in h~*(n). This will always be possible since h~' will contain 
every non-zero combination of digits taken k at a time. 

7) Of the set of digits appearing directly below 2’, 
subtract all those which follow the comma. (One and 
only one comma can appear preceding a digit directly 
under 2’.) The resultant sequence is x(n), the restored 
message. 

An additional example is helpful in illustrating these 
seven steps. 


Example B 

1) Choose p = 5, k = 3,1 = 31. 

2) For this case we find that an appropriate filter is 
given by 


3 
Z 


2+e+24+3° 
It can then be verified by division that 


Ly) 
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hA?m =-1403024413402121 444 044224382 Oe 


and that the sequence has the required properties. 
3) Consider a typical message block, 


an) = 1234043210123404321012340432000- 


4) The message, u(r), at the output of the coding filter, 


H@ea =e@+2 +24 3)’, is 


un) = 137232443 9311232443 2301 23824 Oe 


Let us assume that an error of 2 occurs in the 9th place, 
producing a zero in the 9th place of u’(n). 
5) The response of H™*' to u’(n) is 


win)= 1234043239133 3 S$ 20324 0 OF 2 eee ee 


Note that all digits of 2’(n) starting with the ninth (as 
underlined) differ from those of x(n). 

6) Writing h-*(n) in a position to match the last k 
digits (and incidentally, all those that follow), 


a(n) = 
han) = 
x(n) = 


7) The last sequence is the restored message. 


V. THe Stare Vectror-Matrix FORMULATION 


In the synthesis of the coding and decoding filters, a 
characterization of the LMSC in terms of a state vector 
and a characteristic matrix is more convenient than the 
transfer function characterization utilized in section II. 
The formulation which follows is completely equivalent to 
that used in the preceding discussion. 

Referring to Fig. 2, we may represent the state of a k’th 
order circuit at time n as the column vector 


8,(n) 


Sy sin) 


S,(n)_|. 


We define 


os) 
I 


0 
as the “null state,” and it will be assumed that s(0) = 


0, unless otherwise stated. Using the state concept, the 
operation of the circuit is described as follows 


y(n) = Cs(n) + Dx(n) 


s(n + 1) = As(n) + Bx(n) (10) 


12:3°410 4.302 3.3°193.:3°3' 35220539410 Om 2 332 e2 oe 
«3 29 10° 0,2°3 0420 45373 251.53 0 452423 ors ORS oe 
1234043 210123404: 3°2.1-0 1293-4,014.3920,0505- 


where i 2 
ee ee CLD —~Ay-1 
1 0 0 
[A] =} 0 i @ (11) 
EW WO ase 1 Oa 


(This form of matrix is known as a “companion” form.) 
|A — XZ] = MY 4+ adh? + ar? Hoe) + apd + ay. 


In the case of one input and one output, 


x(n) = x(n) 

yn) = yn) 
b; 
Pet [C],= [00 --- 01] 

[B] = ipa (12) 
by 


The relations between the 8;’s appearing in H(z) and the 
b,’s appearing in Fig. 2 and in the matrices have been 
defined previously [see (6)]. 

It will be observed that the same polynomial appears 
throughout this discussion, first in the difference equation, 
then as the denominator of the transfer function, and now 
as the characteristic equation of A. The matrix A is 
defined as the “characteristic matrix” of the LMSC, and 
it will be seen that its characteristic equation plays a 
fundamental role in the coding problem. 


ae 


p59 


Of particular interest here is the response of this system 
| a unit impulse. Since it is more convenient to deal with 
ne unexcited system, we can, as in ordinary linear system 
heory, interpret the impulse response as the homogeneous 
esponse to some “virtual” initial state, s’(0). (We say 
virtual” since the system has not actually started from 
te initial state. We have simply used s’(0) as an artifice 

replace the effect of the initial impulse.) Using this 
aterpretation, the equations for an unexcited system of 
rder k become: 


s(n) 
y(n) = 


l 


ZASO)) De. Secs tc 
oo s’(0) = | (13) 
bo 


“he form of s’(0) can easily be verified by reference to 
fig. 2. 

~The LMSC without excitation can be visualized simply 
s a shift register. The content of its left-hand place at 
ime » + 1 is calculated as a linear combination (mod 7) 
f the contents of all positions of the register at time n. 
rig. 4 shows such a register in its virtual initial state. 


SHIFT RIGHT 


/ 7 7 
Roles Tee 


y(o) 
) 


[ 
mC 


Fig. 4—Unexcited LMSC in “virtual” initial state. 


Note that the unexcited output of the LMSC is simply 
che number which appears at the right-hand end of the 
register. Note, also, that in the unexcited case, since the 
register is assumed to shift right at each clock pulse, 
| s(n) = s:i(n — 1I)G = 2,3, --- , #). 


ct can be deduced from Fig. 4, or equivalently from (13) 
und (14), that each state of the unexcited LMSC appears 
n the form of a subsequence of length & imbedded in the 
impulse response. For example, the impulse response 
)*(n) used in example B contains each successive state 
of the LMSC, H~'(z), as a subsequence of length 3. This 
structure is placed in evidence in Table II, where a portion 
of h-'(n) is displayed together with the succession of 
states of the LMSC which produces it. 


(14) 
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TABLE II 

ImpuLse RESPONSE AND STATE SEQUENCE 
n 0 ih) 2 3 4 5 6 7 8 9 
h(n) 1 4 0 3 0 », 4 4 1 3 
8i(n) 0 3 0 2 4 4 1 3 4 0 
s(n) s(n) 4 OF Sa Os ees era ee 3 | 4 
83(”) 1 4] 0 3 | 2); 4) 4 1 3 

s'(0) 


VI. GENERATION OF PERIODIC SEQUENCES: 
GALOIS THEORY 


A critical examination of the procedure outlined in sec- 
tion IV leads immediately to several fundamental ques- 
tions: 1) How can we be assured of the existence of a 
network which will produce an impulse response sequence 
of the desired periodicity? 2) How do we know that the 
sequence will have the required internal structure? 3) Is 
there a systematic method for constructing such a net- 
work? Fortunately, the mathematical theory of Galois 
fields answers the first two of these questions in the 
affirmative, and suggests an answer to the third. A 
summary of the portions of Galois theory which are perti- 
nent to this discussion is presented in the appendix. 

It has been demonstrated in Friedland and Stern’ 
that an isomorphism exists between the set of all powers 
of any square matrix over the modular field, GF(p) with 
an irreducible minimum polynomial’’ and the elements 
of a multiplication group of a Galois field. We make use 
of this isomorphism to construct periodic sequences. In 
an earlier paper, it has been shown that if the character- 
istic polynomial of a matrix A over GF(p) is irreducible, 
then A is periodic with period T, (i.e. A” = J, and T 
is the smallest exponent for which this is true). It has also 
been shown that 7 is a divisor of p” — 1. If a matrix of 
this type is the characteristic matrix of an LMSC, then 
the state sequence of the unexcited circuit, starting in any 
non-zero initial state s’(0) is periodic with the same 
period; i.e., s/(7’) = s’(0). Consequently, the impulse 
response is also periodic with period 7. It has further been 
shown that there exist matrices of this type with periods 
of maximal length 7 = p* — 1; and hence it is possible 
to synthesize an LMSC whose impulse response has the 
period p* — 1 [see (13)]. It follows that every non-zero 
state (or, equivalently, every non-zero subsequence of 
length k, in h(m)) must appear once and only once in each 
period of such a sequence. But this is exactly the type of 
structure that is necessary to fulfill conditions (9) in section 
IV. (It will become apparent as we proceed further, that 
all maximal period sequences have the required property 
that they contain p — 1 disjoint and distinct subsequences 
of length /, each one of which is a multiple of the first.) 


10 The minimum polynomial m’(«) of a matrix A is the poly- 
nomial of lowest degree such that m'(A) = 0. It is either identical 
to the characteristic polynomial or a factor thereof. In the latter 
case, A is said to be derogatory. (S. Perlis, “Theory of Matrices,’ 
Addison-Wesley Press, Cambridge, Mass., p. 147; 1952.) 
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The decoding problem has now been reduced to one of 
synthesizing an A matrix which is periodic with maximum 
length period. Let us consider a k X k matrix A with 
characteristic equation m(A) irreducible over GF(p). The 
isomorphism between powers A‘ of the matrix, and the 
elements p,(A) of the Galois group of GFlp, m(A)] is 
placed in evidence by expressing each power of A as a 
matrix polynomial in A of order < k. (That this is always 
possible, is a direct consequence of the Cayley-Hamilton 
theorem: a matrix satisfies its own characteristic equation.) 
Then, each polynomial in A is isomorphic to the same 
polynomial in X. In order that A have a period of maximal 
length, its corresponding element, A, must be a primitive 
element of the group of GF[p, m(A)] (see Appendix). We 
will now demonstrate that by proper selection of the 
polynomial, m(A), it is always possible to make ) a primi- 
tive element, and therefore make the period of A maximal 
length. 

Consider the expression 


Pe i, nara Bcc ae a 
ss II (a?! aye I] (ope ae LEW oe 
; (15) 


where pi, Po, -:: p; are the distinct prime factors of n, 
and n = p* — 1, pa prime. The products, 1G¢, are taken 
for all the combinations of distinct p’s in the numbers 
indicated, each product [] in the numerator referring 
to an even number of the p’s and each one in the denomi- 
nator referring to an odd number of p’s. 

It can be shown"’ that f,(A) is a polynomial of degree 
degree y(n) [where g(r) = the number of integers less 
than n, relatively prime to n] with integral coefficients, 
and that it contains all of the primitive n’th roots of 
unity. [The elements of the Galois group of GF(p*) are 
isomorphic to the (p* — 1)th roots of unity.] The poly- 
nomial, f,(A) ts basic to our problem since it can be 
factored, mod p, into the form 


Ain) = Wel 4 2.4.07 2-2 ort 500.4 


where each g;(A) 1s a monic k’th order polynomial irre- 
ducible over GF(p). In addition, it can be shown” that 
the necessary and sufficient condition for an element a 
of GF[p, m(A)] to be primitive is that it satisfy the ex- 
pression 


g(a) = 0 mod m() 


for some q;(A). But if we choose our modulus polynomial 
m(A) to be identical with some q;(A) then d itself satisfies 
that factor mod m(A). Thus, we deduce the following 
theorem. 


11 R. D. Carmichael, “Introduction to the Theory of Groups of 
Finite Order,’ Dover Publications, Inc., New York, N. Y., pp. 
242-288; 1956. 
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Theorem: Given a k X k matrix A over the modular 
field GF (p), the necessary and sufficient condition for A 
to be periodic with maximal length period is that its 
characteristic equation m(A) be a factor, mod p, of 
fori (A). 


This theorem can be clarified by an example. 


Example C: Design of Filter of Example I 


We have p = 5, k = 2. 
In this case, the maximum length period is 7’ = 
1 = 24, First, form f,.(A). 


yy = 


foa(X) = + 404 +1 


$(24) = 8, so that f.4 is of eighth degree. It can be factored 
into four quadratics, each of which is irreducible over 
modular field GF (p) 


foa(A) = (A? + 4d + 2)(0? +3 +. 3)(X? ++ 2)(\7 + 2A +308 


We may now construct an A matrix in companion form, 
whose characteristic equation is any one of the factors, 
say m(A) = ” + 4d 4+ 2. 


Table III lists all powers of A with their equivalent 
polynomial forms. As the table indicates, A does indeed 
have a period of length 24. We may also construct a 
transfer function based upon m(A) 

2 
OT Fe ETS ADO 

Note that the donimator is derived from m(A), but the 
numerator can be chosen more or less arbitrarily, each 
choice resulting only in a cyclic permutation of terms of 
the basic impulse response. From H(z), we can calculate 
h(n) and observe that it is of maximal length period as 
expected. 


# 1: 371.0,3°3 2 1200 see 


It should be noted that all higher powers of A which are 
relatively prime to 24, 7.e., 5, 7, 11, etc., are also primitive 
elements. Therefore, any one of them can also be used as 
the generator of a maximal length sequence. However, 
the higher powers of A will not in general be’ in the 
companion form used in (12). It is, therefore, necessary 
to reduce them to companion form by means of a simi- 
larity transformation. (A similarity transformation leaves 
the characteristic equation unchanged.) 


For example, 


ye i i issue en iF A 
2 0 1 0 
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A'=4A4T 


TABLE III 
Power Matrix Polynomial Power Matrix Polynomial 

ees 4 2 

Al Al =A. A | 4A 
ne) 4 0 
4 3 ez 

A? A? =A + 31 Au 4A + 27 
le 333 4 
74 Pe 3 

As A?=4A+ 321 At A+ 2] 
4 3 1 
come 1 
: AU SS Pasl, Sp PAE || AUK 3A + 31 
0 
f 
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Note that this matrix has a characteristic equation 
+ 3\ + 3 = 0, which is a factor of f.., thus confirming 
the fact that it has a maximum length period. From this 
example, it can be observed that once one matrix of 
maximal length period has been determined, all other 
matrices of maximal length period are easily calculated. 
Sometimes it is difficult to perform the factorization 
indicated in Example C. This is especially true for f,(A) 
of high degree. However, the example still offers an indi- 
cation of how to proceed. For example, consider the case 
illustrated in example B: Ty, = 124, and $(124) = 60. 
Therefore, there exist 20 monic polynomials of third 
degree which are appropriate characteristic equations. 
N ow, out of the 125 monic third order polynomials over 
GF(5), 85 are reducible. This leaves only 40 irreducible 
polynomials left from which to choose. Thus, by selecting 
any irreducible polynomial at random we have a prob- 
ability of 0.5 of finding an appropriate one. 
| 
| 
| The foregoing material has been principally devoted to 
the case of unrestricted single errors; 7.e.,. w = p — 1. 
However, extension to other types of single errors is quite 
straightforward. If we are to consider only close-packed 
codes, (2) with the equals sign must have an integer 
solution. It has already been stated that such is the case 
for w = p — 1. It can also be shown that this is always 
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the case for w = 1, 2; with the understanding that 
w < p — 1 in all cases. (It should be noted here that 
when we restrict ourselves to smaller magnitudes of 
errors, the allowable word length / for a given number of 
check digits increases.) For other values of w it may not 
always be possible to construct a close-packed code. For 
example, the case p = 7, k = 3, w = 4 does not have an 
integer solution for l. Therefore, to maintain some sort of 
generality, we will consider only the cases w = 1, 2. The 
former case might correspond to a channel in which 
transition from one level to the next lowest is the only 
type of error which is likely. In this case, an error would 
correspond to an impulse of magnitude p — 1. The latter 
case might correspond to the small error case considered 
by Ulrich,* where each digit may change to the level just 
above or just below. (These models may be somewhat 
unrealistic in that a transition from zero to p — 1 is 
possible in the model, although it would probably be 
unlikely in practice.) Let us consider these two cases 
individually. 

w = 1: In this case, 1 = p* — 1; @.e., the block length 
is Just equal to the period of the maximal length sequence. 
Assume that the magnitude of the expected error is m. 
Then, an error in the 7’th place of the block will produce 
a response, mh’ (n — 7 — 1), where h~'(n) is the impulse 
response of the decoding filter. Thus, we must now com- 
pare the last k digits to mh~* rather than h~*. With this 
modification, the decoding procedure is the same as that 
described in section IV. 

w = 2: In this case, the block length is just one half 
of the maximal length sequence. An important con- 
sideration here is that an h~’(n) must be chosen which 
prevents ambiguity. For example, consider the case 
p = 5,k = 3, and error magnitudes of either 1 or p — 1, 
(one level up or one level down). In this case, 1 = 62, 
which means that the block length covers two of the four 
subsequences of h~*. If one of these two subsequences 
happened to be p — 1 times the other, it would be im- 
possible to distinguish between an error of magnitude 1 
occurring in one half of the block an an error of magnitude 
p — 1 occurring in the other half. Fortunately, it is 
generally true that any k X k matrix possessing a maximal 
length period has the property that 


ACG pi yl (p > 2). 


This assures that there will be no ambiguity in the case 
just described. For other types of small errors, however, 
it is necessary to select an unambiguous sequence. 


VIII. ConstTrucTrion oF A CoDING AND 
DECODING SYSTEM 


To complete this discussion, a physical system which 
realizes the aforementioned coding and decoding systems 
is presented. We shall consider only the case of unrestricted 
errors. (Small errors can be handled with minor modi- 
fications.) The coding system is simply an LMSC having 
a transfer function H(z). 
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Fig. 5—Decoding machine. 


Fig. 5 shows the structure of the decoding system. 
Basically, it is a small computer comprising a controller, 
three shift registers, an LMSC and an accumulator. The 
function of each component is described below: 


1) Controller: Contains the program and the logic 
elements. Controls the operation of the decoder. 

2) U Register: A shift register which stores the incoming 
message, wu’(n) from right to left. During decoding, wu’ is 
shifted to the right into LMSC. 

3) LMSC: A modular sequential circuit having transfer 
function, H~*(z). Its operation, and that of all shift 
registers, is initiated by a shift pulse, so that the LMSC 
advances one state with each pulse. 

4) X Register: A shift register which stores 2’(n), 
the output of the decoding filter, H~*. When the switch 
S is in position 1, this register is loaded from the right, 
by the LMSC. 

5) H Register: A shift register which is loaded from the 
right by the LMSC when the switch S is in position 2. 
Its function is to store portions of the impulse response 
h~*(n) for comparison with the check digits of x’(n). 

6) A Register: An accumulator used for comparing 
check digits and for storing the corrected message x(n). 
(All registers are of length /.) 

A typical program for this machine might be as follows. 


Program 

1) Reset LMSC and registers X, H, and A to zero. 

2) Set S to position 1; read contents of U through 
LMSC into X (this requires / shift pulses). 

3) Set S to position 2; reset LMSC to initial state, 
s’(0). 

4) Add contents of X to A. 

5) Add complement of contents of H to A. 

6) Examine last k digits of A: 
a) If they are zero go to instruction 11; 
b) If they are not zero, go to instruction 7. 

7) Reset A to zero. 
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8) Examine the first digit of H: 
a) If it is zero, go to instruction 9; 
b) If it is not zero go to instruction 13. 
9) Apply one shift pulse to LMSC. 
10) Go to instruction 4. 
11) Print out contents of A. 
12) Go to instruction 1. 
13) Reset H to zero. 
14) Go to instruction 9. 


APPENDIX 


In the present appendix are summarized some of the — 
definitions and properties of a Galois field of order p* 
(which we shall alternatively call a field of k-dimensions). 
For proofs of the properties listed below the reader is 
referred to a standard text on group theory.” 

Elements of GF (p*) 


The set of all polynomials of degree k — 1 with co- 
efficients in GF(p) constitutes the elements of a k-di- 
mensional Galois field GF(p"). Specifically each element 
is of the form 


OO, ea Male ea Cae 
where a, Qs, --* , @ range over the integers 0, 1, --- ,4 
p — 1. There are clearly p* elements in this field. 
Addition 


Addition in GF (p") is defined as the sum, modulo p, 
of pairs of polynomials; 7.e., if 


= Oy = deh ee ee 
and 

B= by tf bad + ON 
then 


=atsB=Q + di) + Gian + Oa)A oe 


+ (a, + bye mod 9. 


Clearly the set is closed under addition, and the following 
properties are present: 


CMe fey yw! 
(6 ry) se) eaten 
There exists an x in GF(p‘) for every a such that 
aa = 0 = 0 +0) eee. 
Multiplication | 


Multiplication in the Galois field GF(p") is defined as 
the remainder obtained by dividing the product modulo 
p of a pair of elements by an irreducible base polynomial 
in GF (p) of degree k, 2.e., 


oB =oB = (qa tart: + Gin) 


(dy; ae by-1A aie oe Se bd") mod [p, m(A)]. 
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Phe double modulus, notation mod [p, m(\)] indicates 
that the product of the elements mod p is first formed, 
ind then the remainder upon dividing this product by 
n(A) is found. The product depends on the irreducible 
o0lynomial; hence, the field may be explicitly identified 
oy using the notation GF[p, m(d)]. Clearly, the set of 
elements is closed under this operation, and the following 
properties obtain: 


a'B = Bra 
a-(B-y) = (a8) -y. 


There exists an x in GF(p‘) = GF[p, m(d)] for every 
x # 0 such that 


at =1=1+0+--- +0". 
In addition the distributive property holds 
OB aye) OB ary. 
Powers of an Llement 


The non-zero elements of GF(p*) constitute an Abelian 
zroup under multiplication; hence for every a + O, there 
exists a w such that 


or = Ile 


The smallest number » = ¢ for which the above equation 
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is valid is called the period” of a [which depends on 
m(A)]|. The set of elements 

a, a, . 
constitute an Abelian subgroup of period t, with a@ as its 
generator. 

The following theorems are of particular importance in 
connection with modular sequential circuits. 

Therorem t: The period ¢ of any subgroup of the multi- 
plication group of GF (p") is a divisor of p* — 1. 

Theorem iwi: The number of elements of period ¢ in 
GF (p") is v(t), where ¢(t) is Euler’s ¢-function-and is 
equal to the number of numbers > 0 and < ¢ relatively 
prime to ¢. 

If the period of an element is p* — 1, the element 
generates the entire multiplication group of GF(p‘). 
Such an element is called primitive. From Theorem ii, 
the number of primitive elements is ¢(p* — 1). 

The period of any particular element depends on the 
base polynomial which defines multiplication in the field. 
By the appropriate choice of a base polynomial, it is 
possible to make any element in the field except zero 
have any of the allowable periods. 


2 The term period, rather than order is used to avoid confusion 
with the order of an LMSC, although the latter is used in the mathe- 
matical literature. 


Some Spectral Properties of Weighted Random Processes’ 


H. 8. SHAPIRO{ anv R. A. SILVERMANT 


_ Summary—We study the power spectrum and, more generally, 
the spectral covariance of weighted stationary processes. It is 
found that if the power spectrum of the underlying stationary 
process is suitably well behaved and properly matched to the weight 
Function, then the high-frequency behavior of the power spectrum 
and spectral covariance is especially simple. Asymptotic theorems 
describing this behavior precisely are given. 


INTRODUCTION 


fsN many problems involving random processes, the 
| . . . . 

| object of primary interest is a stationary random 
| process, x(t) say, but what is actually available for 
observation instead is the weighted process y(t) =m(t)x(t), 
obtained by multiplying x(t) by the nonrandom weight 
_ * Manuscript received by the PGIT, March 13, 1959. The re- 
search reported in this article has been sponsored by the AF Cam- 
bridge Res. Center, Air Res. and Dey. Command, under Contract 
No. AF 19(604)3495, and by the Office of Naval Res., under Con- 


ract No. N6ori-201(01). 
+ Inst. of Math. Sciences, N. Y. U., 25 Waverly Pl., New York 3, 
NEY? 


function m(t). For example, in analyzing a radio scattering 
experiment, the dielectric noise x(r),' which is usually re- 
garded as spatially stationary, is multiplied by a weight 
function m(r) representing the joint effect of the gain pat- 
terns of the transmitting and receiving antennas, and only 
the region where m(r) is large, the so-called scattering vol- 
ume, makes an appreciable contribution to the received 
scattered signal.” Further examples abound in the impor- 
tant problem of experimental power spectrum measure- 
ment, where the weight functions m(t) correspond to the 
data windows discussed by Blackman and Tukey;’ these 


1 By the dielectric noise we mean the refractive index fluctua- 
tions usually attributed to the action of atmospheric turbulence. 
Here we replace the scalar ¢ (time) by the vector r (position), since 
we are dealing with a random field. 

2 Proc. IRE, Scatter Propagation Issue, vol. 43; October, 1955. 

3R. B. Blackman and J. W. Tukey, “‘The measurement of power 
spectra from the point of view of communications engineering,’ 
Bell Sys. Tech. J., vol. 37, pp. 185-282, January, 1958; pp. 485— 
569, March, 1958. 
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authors give a detailed analysis of the practical conse- 
quences of various choices of m(t). Weighted processes have 
also been studied by Pugachev* and Burford.’ Indeed, in a 
sense, weighted processes are more physical than station- 
ary processes, since the latter have the unreasonable 
property of continuing forever with constant average 
noise power. (Of course, this property is never taken too 
seriously by applied people, who customarily allude to 
stationary processes as convenient idealizations of physical 
noises. ) 

The question with which we are mainly concerned in 
this paper is roughly the following: Under what conditions 
will the power spectra of w(t) and y(¢) be approximately 
the same for sufficiently high frequencies? [For, whereas it is 
to be expected that in general x(¢) and y(t) will have quite 
different power spectra at low frequencies, because of the 
modulatory action of m(t), it is quite possible for the 
spectra to be nearly the same at sufficiently high fre- 
quencies.] We find that to give even a partial answer to 
this question we must explore the asymptotic behavior 
of convolutions. 


PRELIMINARY ANALYSIS 


Let x(t) be a zero-mean stationary random process, 
so that Hx(t) = 0, Ex(t)e(t’) = C(t — t’), where, as usual, 
FE denotes the expectation or ensemble average, and the 
overbar denotes the complex conjugate. We assume that 
the autocorrelation function C(7) is continuous and 
absolutely integrable, and write the Wiener-Khintchine 
relations in the form 


CGie= ie Le GeeGegaln 


fis) = 5 [exp (~iwn) CC) dr, 


where the power spectrum (more accurately, the power 
spectral density) f(w) is non-negative, continuous and 
integrable; if x(¢) 1s real, f(w) is even, but more generally 
we allow x(t) to be complex. Now let m(¢t) be a bounded 
continuous function, and construct the weighted process 
y(t) = mtx). Clearly Ey@) = ~-0, Ay@ge) = 
m(t)m(t’)C(t — t'), so that in particular y(é) is not station- 
ary. [However, the normalized process y(t)/(Ely(t)|?)'? 
is stationary.] We assume that m/(t) is square-integrable, 
which implies that the sample functions of y(¢), unlike 
those of w(t), are themselves square-integrable with 
probability one. To see this, note that 


Ef |yora=f El yo Pa 


= (0) ip | may dees 


4V.S. Pugachev, ‘“The Theory of Random Functions and its 
Application to Problems of Automatic Control,’ Gos. Izdat. Tekh.- 
Teor. Lit., Moscow; 1957. (In Russian. ) 

6 T. M. Burford, ‘‘Nonstationary velocity estimation,”’ Bell Sys. 
Tech. J., vol. 37, pp. 1009-1021; July, 1958. 
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We can therefore take L’ Fourier transforms of the 
sample functions of y(t) individually, obtaining the 
spectral process’ ’’ 
Y@) = 5 [exp (—iany(y at (1) 
Die ll es 

with (spectral) covariance ['(w, w’) = EY (w) Y(’). 

We now find the relation between I'(w, w’) and the 
power spectrum of x(é). First we observe that 


[oo] 


at) = iL _ exp (twt) dX(w), (2) | 


where the integral is meant in the mean square sense, 


and the increments of the spectral process X(w) obey the 
symbolic relation®’® | 


E dX) dX@’) = fw) 6 @ — o’) dw da’. 
It follows from (1) and (2) that 


fo) 


exp (—twt)m(t)x(t) dt 


(3) 


= [ Me -) aX &), 
where 


WO > / i exp (oD 


is the Fourier transform of the weight function m(t). By — 
Plancherel’s theorem, M(w) is itself square-integrable, 
and in fact f°. |M(w)|? dw = (1/2r) f%., |m(#)|’dt. Finally, — 
forming the covariance of Y(w) and using (3), we obtain — 


Tar / ; Mw —s)M(w’ — djs) ds, (my 


the desired relation between I(w, w’) and f(w). The 
quantity T'(w, w) = H |Y(w)|? is appropriately called the 
power spectrum of y(¢), even though y(¢) is not stationary; 
for simplicity we shall henceforth abbreviate T'(w, w) to — 
just y(w). Specializing (4) to the case w = w’, we obtain 
vo) = [| Ke-9@a= [ K@fe-Has, © 
where we have introduced the notation K(w) = |M(w)|’. 
Thus, the power spectrum of y(¢) is the convolution of the 
power spectrum of z(¢) with the non-negative integrable 
kernel K(w), a familiar result.* In what follows, we shall 
assume, aS we may without loss of generality, that 
j=. K(w)dw = 1; otherwise, we need only supply an 
appropriate constant factor in certain obvious places. 


ASYMPTOTIC BEHAVIOR OF y(w) 


Examining the convolution (5), we expect that if the 


6 By f(w) « L?, we mean, as usual, that the (Lebesgue) integral 
die |f(w)|?dw is finite. 

7 Here, and occasionally below, the integral means 1.i.m.4—.. [4 

8 6(w) is the Dirac delta function. 

° M. Loéve, “Probability Theory,’ D. Van Nostrand, Inc., New 
Yorks Ne Yeuchs LO 1955: 


: 
(% 
arnel K(w) is small compared to f(w) as w > ©, and if 

) does not vary too rapidly, then y(w) should behave 
: f(w) for large w, since y(w) is an average of f(s) with 
eight K(w — s) centered at s = w. [If, on the other hand, 
(@) is large compared to f(w), the roles of f(w) and K(w) 
-e _interchanged.] Speaking even more qualitatively, 
hen two functions are convolved, it should often be 
ossible to regard the function which falls off more rapidly 
s averaging the function which falls off less rapidly. We 
1all see that this intuitive idea can be made precise under 
irly general hypotheses, but that it breaks down when 
@) is too rapidly decreasing. First we must specify what 
e mean by a function which does not vary too rapidly. 
Definition. A function f(w) ts slowly varying as w > 
riefly, slowly varying) if f(w)/f(c) tends uniformly to 1 
henever w, ¢ > © in such a way that w/o — 1. 

Slowly varying functions are related to the slowly 
scillating functions that arise in connection with 
auberian theorems. A slowly oscillating function g(w) 
one for which |g(w) — g{c)| — 0 uniformly whenever 
,¢— © in such a way that w/o — 1;”° a slowly varying 
inction, as defined above, is just the exponential of a 
owly oscillating function. For example, the functions 
L + |o|?)"* and (1 + |w|”)~* log (1 + |w|), p > 0 are 
owly varying, but exp (— ||”), p > 0 is not. The follow- 
ig facts about slowly varying functions are an immediate 
onsequence of the definition. 

1) If f(w) is slowly varying, it does not vanish for 
ifficiently large w; this observation is important in con- 
ection with certain expressions written below where 
‘@) appears in the denominator. 

2) The product and quotient of two slowly varying 
inctions are slowly varying. 

3) If for large w the function g(w) has a derivative and 
\g’@)| < C/e, then |g) — g@)| = | Je g’(s)ds| S 
‘((w/o) — 1), so that g(w) is slowly oscillating. Thus, for 
w) to be slowly varying, it is sufficient that f(w) be 
ifferentiable and nonvanishing and that w(d/dw) log f(w) 
e bounded, all for sufficiently large w. 

4) If f(w) is slowly varying, positive and continuous, 
nen if g > 0 is sufficiently large, we have f(w) < w* for 
irge w. The proof is as follows: For w > wo, f(w)/f(ew) < 2, 
'p is sufficiently close to 1,0 < p < 1. Iterating this 
equality n times, where n is the largest integer such 
r p” 'w > wo, we obtain f(w) < 2"f(p"w) < M2", where 

[ is the maximum of j OF in the interval (oo, a Now 
= p™ < (pw/w) ° = Aw’, where p ° = 2, so that 
‘w) < MAw’, which proves the result. Moreen since 
/f(w) is slowly varying, we also have f(w) > w ” for 
titable p and large w. 

It follows from the last remark that slowly varying 
inctions are, roughly speaking, functions which behave 

e a power of w at infinity. It will be noted that the 
ass of slowly varying power spectra is comfortably 
irge: for example, it contains all spectra obtained by 


10G. H. Hardy, “Divergent Series,’’ Clarendon Press, Oxford, 
ng.; 1949. See p. 124. 
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passing white noise through linear networks, but not 
band-limited spectra. (However, band-limited spectra are 
physically unrealizable.) 

Returning to our heuristic notions, we require that the 
kernel K(w) be small compared to f(w) for large w. The 
mathematical condition which expresses this most 
naturally is that 

Kw) _ 

pee Oe ~ 
It is important to note that (6) implies nothing about the 
relative “widths” of f(w) and K(w), For example, the 
“halfwidth” of Kw) = T/r(1 + (Tw)’) is 1/T, while 
that of fw) = 1/(1 + |o|?), 0 < p < 2, is 1, but with 
this choice (6) is valid for any T > 0. Thus, no restriction 
on the width of data windows will figure in our results. 

We are now in a position to state Theorem 1. 


Theorem 1: (Asymptotic convolution theorem) Let f(w) 
and K(w) be non-negative integrable functions (with 


sf. K(w)dw = 1). If f(@) ds slowly varying and non- 
increasing for sufficiently large w, and if 
K() 
li ==(; 6 
me ae) Me 
then 
7) 
Sean - 


where y(w) ts the convolution (5). 


Remark 1: Of course, the theorem remains true if © is 
replaced by — ~ in the hypotheses and conclusions. 

Remark 2: Actually, we need not assume that f(w) is 
nonincreasing, but the milder restriction 


f(c) < Cf@), when wo <w <a, (8) 


where C' is a constant independent of w. 
Proof of Theorem 1: We have 


ye) = [| K@fe-9ds+ f KOI -9as, 


where R = R(w) is a function of w to be specified later. 
For the moment we suppose only that 


RO = (10) 


as Ww om, 


R@) 


lim = (). (11) 


wro 


Let us denote the two integrals in (9) by yi(w) and y2(w), 
respectively. We consider first the quantity 


n()/fe) = [| KO — 9/1) ds, 


which it is convenient to rewrite as 


vils)/its) = [ KOse(le — 9)/fw) ds, (12) 


where ¢p(s) is the characteristic function of the interval 
(—o, R(w)), 2.e., the function equal to 1 when sg lies 
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in (—o, R(w)) and to 0 otherwise. Now the ratio 
f(a — s)/f(w) is bounded in the two-dimensional region 
w > a, — © <s < R(w), where w is a suitably large 
number. For 0 < s < R(w), this follows from (11) and 
from the fact that f(w) is slowly varying; for s < 0, it 
follows from the fact that f(w) is nonincreasing [or, more 
generally, satisfies (8)]. Moreover, for each fixed s, 
lim, {f(w — s)/f(w)} = 1 [again because f(w) is slowly 
varying]. Thus, since K(w) is integrable, we can use the 
Lebesgue dominated convergence theorem to justify 
passing to the limit under the integral in (12), with the 
result 


Lim 728) 


lim (a) = [ Ke ds =") 


Consequently, the proof of Theorem 1 will be complete 
when we show that 


Fe a 


wre fle) 


Now, in view of (6), we can write 


K@) = e)f@), 


where ¢«(w) — 0 as w > 0. This gives 


(18) 


nels) = [IGM — 9) as 


for the second integral in (9). Therefore, since f(w) is 
nonincreasing [or satisfies (8) ] 


ye) S (ROI) [fle — 9 ds = Cle RIB), 
where C’ is a constant and 
E(w) = sup e(c). 


Note that ¢)(w) is nonincreasing and — 0 as w — 0. 
Hence, to establish (13) and thereby complete the proof, 
it suffices to construct a function R(w) — © which satisfies 
(11) and is such that 


. eo Rt) f(R) ats 
lim re ae 0. 


Since f(w) is slowly varying, there exists a number op, 
Os=5p —<l. such that 


f(pw) < 2f(w), 


provided that w is suitably large. Iteration of (15) leads 
to the relation 


(14) 


wo 


(15) 


f(p"w) < 2"f(w), 


which holds for all n, again provided that w is in each case 
suitably large. Now, since €o(w) — 0, we can construct 
a sequence a, < w. < w3; < - of positive numbers 
with the properties 


(16) 


Geen Da ha (17) 


€o(p"w,) < oy 


IRE TRANSACTIONS ON INFORMATION THEORY 


September 


and also large enough to guarantee the validity of (16) 
for w > w,. We now define 
R@) = pw, 
Since R(w) > p’w, > n, for w, < w < Wa+1, (10) is satisfied. 
Moreover, since R(w)/w = p” for w, < wo < wii, (11) 
holds. Finally, using (16) and (17) we have 
e(R)f(R)/f) < 2"€(p"w,) < 3)”, 
for w, < w < @,1:, which establishes (14) and completes 
the proof of Theorem 1. 


Wn = w < Wn+1+ 


EXAMPLES AND EXTENSIONS 


We now give examples which illustrate the meaning of 
Theorem 1. In some examples one or the other of the 
hypotheses of the theorem is violated, with the result that 
lim,. {y(w)/f(w)} # 1. In other examples, the require- 
ments on f(w) are weakened but those on K(w) are 
strengthened, with the result that lim,.... {y(w)/f(w)} = 1 
remains valid. 

1) An especially simple weight function is 


mi) = Vax/T, Nap Ss 1h 
m(t) = 0, Olea, 


z.e., a “wide open” data window. The corresponding 


(Fejér) kernel is 
K@) = M’@) = sin’ Tw/rTw’. 


If now f(w) = 1/(1 + fol”), 1 < p < 2, then according 
to Theorem 1, no matter how close p: is to 2 and regardless 
of the size of 7, the convolution (5) approximates f(w) 
for sufficiently large w. (How large » must be for good 
approximation will depend, of course, on p and T.) 
This result would be difficult to verify by direct calculation. 

2) Even when it can be done, it is usually very tedious 
to verify Theorem 1 directly. For example, let K(w) = 
V2/r(1 + w*), f() = 1/0 + «), which satisfy the 
hypotheses of Theorem 1. The convolution of these two 
functions can be evaluated by residues and is found to be 
y(w) = @ + 6w* + 2w’)/@* + 40° + 80" — 8w + 4) 

+ 1/2@ — 6/(e° 4a 4 Seno 
We see at once that lim... {y(w)/f(w)} = 1. 

3) As an example of a case where condition (6) is 
violated, choose K(w) and f(w) to be the same slowly 
varying function 1/7(1 + w). Then the convolution (5) 
is found to be y(w) = 2/r(w + 4), so that lim... 
{y(w)/f(w)} = 2 instead of 1. 

4) Choose f(w) = e*'*', a > 0, which is not slowly 
varying, and let K(w) be decreasing for large w and such 
that f2.. e’°K(w)dwa = A < o. Then the convolution 
(5) 18 


@ 


(ey ae8 / e"K(s) ds + e* i e-**K(s) ds. 


Since for large enough w the second integral does not 
exceed 


e"K(w) i : gids = KGa 


959 

nd since 

| TORG) 
lim = 0, 

) wn fe) 

: follows that 

yw) 
Pee 


‘here in general A = 1. 

5) Choose f(w) = |wle”'?' and K(w) = 4e7!°!. Although 
‘w) is not slowly varying, (6) is satisfied. The convolution 
5) is an elementary integral and is found to be 


Se ee er 
o that 
Ve) _ 
San iis 


6) As a more complicated example, choose K(w) = 
exp (— ||”), p > 0," and f(w) = exp (— |w|*), where 
<q < p/(p + 1). Again f(w) is not slowly varying, 
ut (6) holds, and it is in fact true that 


limo» (y(w)/f(w)} = 1. 
' 


‘o show this, we note first that in the proof of Theorem 1 

ve can replace the hypothesis that f(w) be slowly varying 

y the weaker condition 

f(w)/f(c) — 1 unzformly as w, ¢ > © (18) 
in such a way that |\w — o| < R(w), 

rovided that we strengthen the requirements on K(w) to 


K(w) ts noninereasing for large enough w, 


as (19) 
sealant 
for then lim,. {y:()/f(w)} = 1, by an _ obvi- 


us extension of the first part of Theorem 1, and 
ince y.(w) = ff; K(s)f(w — s)ds is bounded above 
y K(R) f2. fw — s)ds, limsse {r2)/f@)} = 0 by 
19), so that lim,» {y(#)/f(#)} = 1 is still valid. Now 
ot R(w) = w*, 0 < a < 1, where a will be chosen later. 
Then, if 0 < w < o, we have 


(w)/f(o) = exp (o* — w’) 


= exp {g(o — w)p""}, 


OQ) << & 


yy the mean value theorem. If ¢ — w < w*, the exponent 
n the right will tend to zero [t.e., (18) will hold] if 
: + q < 1. Moreover, lim... {K(w*)/f(w)} = 0, pro- 
‘ided that we have gq < pa as wellasa + q < 1. These 
nequalities are compatible if we can insert a number be- 
ween g/p and 1 — q,1.e.,if¢/p <1— qorg< p/(p +1), 
yhich completes the demonstration of our example. 


; 
4 The constant a is chosen to make 


{ ce=1. 
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7) The conclusion of Theorem 1 remains true if we 
replace the hypothesis that f(w) is integrable by the 
weaker hypothesis that f(w) is bownded, provided that we 
also replace (6) by the stronger condition 


K(s) ds 
sot et) 
Since integrability of f(w) was used only in showing that 
Y2(w) satisfies lim... {y2(w)/f(w)} = 0, it suffices to show 
that the same is true with the altered hypotheses. Let 
us define ew) = f% K(s)ds/f(w), which by hypothesis 
— 0 asw— o, Then, if fw) < M, 


= 0. 


@-70 


vile) Mf KO a seme 
i i ae 


and as above we know that we can construct a function 
R = Rw) satisfying (10) and (11) and such that lim,_.. 
{e(R)fUR)/f(w)} = 0. Hence the proof is complete. 
Applying this result to the kernel of example 1, we have 
fe K(s)ds = O(1/w), so that the example is valid for 
0O<p < las wellastorie<p => 

8) Continuing our discussion of example 1, we note 
that the case p = 1 is not covered by the arguments 
given so far. This case may be inferred from the following 
extension of Theorem 1: If f(w) is assumed to be of class® 
L* , q > 1, rather than of class L*, and we define g = 
q'/(q’ — 1), then the conclusion of Theorem 1 remains 
valid if (6) is replaced by the requirement 


| il “IK }" as] 


fw) 
For we have only to apply Holder’s inequality, obtaining 


lim = {i}. 


wo 


vie) = [fle — 9K) as 


<| f° we — sia] "| [xeon as |“ 
<¢| [xo as |” 


_ and then proceed as in the previous example. In particular, 


if K(@w) = Ow”) with m > 1, and f@) = 1/(1 + |”), 
0 < p < 1, choose q’ = u/p, where uw > 1 is to be de- 
termined. Then f(w) is in the class L*, and we have only 
to verify that (w “”**)'/’w > 0 as w > &, i.e., that the 
exponent — m+ (1/¢q) + p= p+1— (p/n) — mis 
negative, which is clearly true when yu is close enough to 1. 
Hence, combining results, we find that example 1 is valid 
for'0"< p< 2. 


ASYMPTOTIC BEHAVIOR OF I'(w,w’) 


So far we have been concerned with the power spectrum 
yw) = E |Y()|’. We now turn our attention to the 
covariance T(w, w’) = HY(w) Y(#’), given by 
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Tea) = Mw — s)M(w’ — df(s) ds. —(4) 
The asymptotic behavior of I'(w, w’) is described by the 
following generalization of Theorem 1. 


Theorem 2: Let M(w) be square-integrable with auto- 


convolution 

mo) = [| Me+sM@ds, MO = 
Let f(w) be a non-negative integrable function which is 
slowly varying and nonincreasing for sufficiently large w. 
Suppose, as in Theorem 1, that 


K@) 
limes 
we f() 
where K(w) |M(w)|’, and, in addition, suppose that 
M(w) has a positive lower bound for 0 < w < A. Then, 
ifa <w’ < wall © in such a way that |w — »'| < A, 
we have 


a (6) 


Tw, w’) 
fo" Mw — w') 


uniformly, where T(w, w’) is the covariance (A). 

Remark 1: When w = w’, Theorem 2 becomes Theorem 1. 

Remark 2: Both of the remarks made after Theorem 1 
apply to Theorem 2 as well. 

Remark 3: Of course Theorem 2 is susceptible to the 
same kind of extensions as Theorem 1. 

Remark 4: If we choose w” 3(w + w’), then (20) 
asserts that I'(w, w’) is asymptotically the locally station- 
ary covariance f((w + w’)/2) M (wo — w’).” 

Proof of Theorem 2: Since the proof is almost identical 
to that of Theorem 1, we may be quite brief. We have 


>] (20) 


R — 
Eee / M@M(s + wo! — w)flw — 8) ds 
> / M(s)M(s + w’ — w)f(w — 8) ds 
R 
= T,@, w’) a P.@, w’), 
where R = R(w) is a function chosen as in Theorem 1. 


Applying the dominated convergence theorem to the 
first integral, we get 


T,@, o’) 
fo) 


where in the left side f(w) can be replaced by f(w’’) since 
f(w)/f(w’’) — 1. Then, applying the elementary inequality 
lab) < 3 (a\” + |b|*) to the second integral, we have 


Ol 
+ | M(s + w’ — w) |*}f@ — s) ds. 


lim 


anna 


= [ MOMG +o! - 4) ds = Mo -0, 


| V3(@, w’) | < 4 


2R. A. Silverman, ‘Locally stationary random processes,”’ 
IRE Trans. oN InrormatTion Tuerory, vol. IT-3, pp. 182-187; 
September, 1957. 
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It follows that T,(w, w’)/f(w), and hence I, (w, w’)/f(w) 
tends to zero precisely as in Theorem 1, where now 
{|M(s)|? + |M(s + w’ — w)|’} plays the role of K(w). 


le 


Two WEIGHTED PROCESSES 


There is another direction in which we can generalize 
Theorem 1, namely, we can consider the case of two 
weighted random processes. Thus, let m,(¢t) and mo(t) 
be two square-integrable weight functions and consider 
the two processes y,(f) = m,(é)x(é) and y.(t) = m,(t)a(d), 
both derived from the same underlying stationary process. 
z(t). Then, if Y,(w) and Y.(w) are the Fourier transforms 
of y,(é) and y(t), which by a previous argument exist 
with probability one, we have 


= BY,(@)Y.@) 


Y12(w) : 
(21) 


a iP Mio — s)M.(o — s)f(s) ds, 


where M,(w) and M,(w) are the Fourier transforms of 
m,(t) and m,(t). These considerations suggest the following 
theorem: 


Theorem 3: Let f(w) be a non-negative integrable function 
which is slowly varying and nonincreasing for sufficiently 
large w. Let M,(w) and M,(w) be square-integrable functions 
such that . 


where Y12(w) is the quantity (21). 


: M,(w)M, (w) 
ee 
Then | 
hi ale) _ _ Myles) Male) do = 5 4 . 
lim | “ome ie mid Fialt) at 
| 


Like Theorem 2, the proof of this theorem is essentially 
the same as that a Theorem 1, where this time we identify, 
K(w#) with M,(w)M,(w). eta the two remarks made 
after Theorem 1 apply to Theorem 3 as well; moreover, 
Theorem 3 is susceptible to the same extencions as 
Theorem 1. The meaning of Theorem 38 is illustrated by 
the case where m,(¢) is a translate of m,(t), ¢.e., mo(t) = 
m,(t + 7). We take m,(t) to be a real-valued data window 
which peaks about some central value. Suppose that the 
two data windows overlap only slightly, 7.e., suppose that 
7 is such that 


Uh fee le fee 
On ip m(t)m(t + 7) dt« on i, mi (Oi déa— 


Then Theorem 3 asserts that lim... {yi2(w)/y()} « 1, 
where as usual y(w) is the convolution (5). Qualitatively, 
this means that if we take two essentially non-overlapping 
sections of a stationary process a(t), we shall find that 
their high-frequency contents are only weakly correlated, 
even if the two sections both lie well within the correlation 
distance or memory of x(t), provided, of course, that 
appropriate conditions are met by z(t) and the data 
windows involved. 


Summary—A digital coding and its application to speech trans- 
ission is described. The coder determines the amplitudes and 
es of sucessive extremes (relative maxima and minima) of the 
nal. This information is decoded at the receiver by interpolating 
function between extremes so as to connect them smoothly and 
eserve the extremes of the original signal in the reconstructed 
ave. Thus, the coding is a nonlinear sampling technique. It is 
lated to clipped speech encoding which effectively transmits only 
e times of the extremes. 

The properties of the coding for speech signals have been studied 
digital simulation on an IBM 704 computer. Information rate, 
fatistics of the extremes data, and quality of the resulting signal 
ve been evaluated. The buffer size necessary to receive the 
ndomly occurring data and transmit at a constant rate was 
easured. 


INTRODUCTION 


XTREMAL coding is a simple procedure for 
representing a signal by a sequence of numbers. 
| The coding can be considered for signals which 
) are continuous functions of time, b) are to be trans- 
uitted digitally, and c) can tolerate a certain minimum 
istortion. 

The transmitted information consists of the amplitudes 
£ the signal at its extremes (a,, a, --- in Fig. 1) and the 
me intervals, t;,,; — ¢;, between extremes. In the decoding 
rocess, a suitable function is interpolated between 
xtremes as illustrated by the dashed curve in Fig. 1. 
‘he extremes of the original signal are preserved in the 
constructed wave, and there are no discontinuities in 
is wave or its first derivative at the extremes. 

The coding is basically a nonlinear, signal dependent, 
ampling procedure. As such, it is not easily analyzed 
‘ith presently available mathematics and is perhaps best 
valuated by considering an application to a specific 
ignal such as speech. The principal questions to be 
ivestigated are: 

1) What is the information rate of the representation? 
2) What is the effect of the inherent distortion? 

3) What time delay is associated with the coder? 

‘he time delay arises in a buffer which must be inserted 
etween the randomly occurring extremal information 
purce and any constant rate transmission facility. 
This paper considers in detail the coding applied to 
peech signals. Application to television signals is pre- 
ented elsewhere.’ For speech, the coding is related to 
vfinite clipping’’* in which the only transmitted infor- 


| * Manuscript received by the PGIT, May 11, 1959. 

| t Bell Telephone Labs., Inc., Murray Hill, N. J. 

18. Julesz, “Coding television signals based on edge detection,” 
ell Sys. Tech. J., vol. 38, pp. 1001-1020; July, 1959. 

~2J.C. R. Licklider and Irwin Pollack, “Effects of differentiation, 


tegration and infinite peak clipping upon the intelligibility of 
beech,” J. Acoust. Soc. Am., vol. 20; January, 1948. 

3 J. C. R. Licklider, ‘Intelligibility of amplitude-dichotomized, 
me quantized speech waves,” J. Acoust. Soc. Am., vol. 22, pp. 
20-823; November, 1950. 


59 IRE TRANSACTIONS ON INFORMATION THEORY 


129 


Extremal Coding for Speech Transmission’ 


MAX V. MATHEWSt 


ORIGINAL 
SPEECH WAVE 
—— 


INTERPOLATED 
SPEECH WAVE 
—s 


te. 2. eee teens ts 


Fig. 1—Analysis and synthesis of extremal speech. 


mation is the times of the extremes, these being the zero 
crossing times of the infinitely clipped derivative of the 
speech wave. In addition to these times, the extremal 
coding also sends the amplitudes of the wave at the 
extremes. Thus, better quality at the cost of a higher 
information rate is achieved. 

The study was carried out by digital simulation on an 
IBM 704 computer using speech-to-computer input- 
output equipment.* A number of speech samples for 
subjective evaluation were generated, and statistics of 
the extremes sufficient for channel capacity and buffer 
size estimates were measured. 


DiciraAL SIMULATION OF EXTREMAL SPEECH 


The simulation program consisted of recording a 
digitalized speech passage on an IBM 704 magnetic tape 
and processing this tape with computer programs to 
obtain a number of digitalized samples representing the 
reconstructed extremal speech for various parameter 
values. In the process, the extremal channel capacity and 
buffer size were evaluated. The reconstructed samples 
were converted to audible speech for subjective evaluation. 


A. Preparing Source Tape 


The original speech passage contains four different 
sentences spoken simultaneously into a good quality 
dynamic microphone and a carbon button microphone 


4H. E. David, Jr., M. V. Mathews, and H. S. McDonald, “De- 
scription and results of experiments with speech using digital 
computer simulation,’ IRE Wescon Convention Record, pt. 7; 
1958. 
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(telephone handset). Each sentence was uttered by a 
different speaker, two being male and two being female. 
The duration of the total passage is 10.4 seconds. As a 
result of a digital recording error, one of the handset 
sentences was ruined so the final passage comprises four 
microphone sentences (10.4 seconds) and three handset 
sentences (8.9 seconds). 

The speech was filtered with a 4-ke low-pass filter, 
sampled at 10,000 times per second, uniformly quantized 
into 1024 levels (10 bits) per sample and recorded on a 
digital computer tape which served as the input to all 
programs. Negative speech peaks were at level 0, positive 
peaks at level 1024 and speech zero at level 512. 


B. Simulation Program 


The simulation program contained two major parts, 
the first for locating the extremes, the second for inter- 
polating a wave between the extremes. Each extreme was 
located by: 

1) Selecting a sample which is an extreme with respect 

to the two adjacent samples. 

2) Passing a quadratic polynomial through the extreme 

sample and the two adjacent samples. 

3) Taking the peak of the polynomial as the extreme 

of the speech wave. 
By this interpolation process, the extreme times ¢; were 
determined to an accuracy greater than one sampling 
time. This fairly elaborate procedure for locating the 
extremes was found necessary to remove a quaver which 
occurred in the reconstructed speech if the extremes were 
located only to an accuracy of one sampling time. 

The extreme amplitude a; was further quantized into 
32, 64, or 128 levels using nonuniform quantizing 
characteristics as illustrated in Fig. 2. The quantizing 


QUANTIZING 
LEVELS BISECTED 
BY 45° EINE 


QUANTIZED a, —> 


Sl ilar? 


Fig. 2—Nonuniform quantizing characteristic. 
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characteristics are those suggested by Smith’ for PCM 
speech. The quantizing interval d; varies approximately 
according to the relation | 


Re 
Py 
where d; and the abscissa points P; are defined in Fig. 2, 
2N is the number of levels, and » is the compression ratio. 
Compression ratios of 30 db for the 128, 64, and 32 level 
quantizing, and 35 db for the 8 and 16 level quantizing 
were used. (The 8 and 16 level quantizing was used for 
PCM comparison which will be described later.) 

The reconstructed speech was generated according to 
the relation : 


OP SAN apf?) t= 1 2-N (1) 


S’() = a; + Gin — a ( me i) . 


biti a 
bo Sey (20 
in which S’(¢) is the reconstructed speech and F;; is the. 


interpolation function. Two interpolation functions have 
: 
been examined, | 


F(X) = (sin a (3) 
and : 
IORI) (4) 

Both functions have the properties that 
F(0O) = F’0) = F’0) = 0 (5) 


and 


F() 


I 
= 


(6) 


Thus, both 
a) preserve the extremes of the original speech in the 
reconstructed speech, 
b) make the reconstructed speech and its derivative 
continuous at the extremes, and 
c) introduce discontinuities in the second derivate 
at the extremes. 
The two functions are very similar and produce speech 
which sounds the same. All the data reported in this 
paper were obtained with the F(X) function. 


C. Threshold and Bump Removal Parameters 


The basic coding described above is modified by two 
parameters, one of which throws out extremes resulting 
from noise when no speech is present, the other of which 
eliminates speech wave bumps which are small compared 
with the surrounding topography. These modifications 
substantially reduce the number of extremes while only 
slightly affecting the quality of the speech. 


> B. Smith, “Instantaneous compounding of quantized signals,” 
Bell Sys. Tech. J., vol. 36, pp. 653-709; May, 1957. 


Daring the silent portions of speech, it is desirable to 
move the extremes resulting from background noise as 
ese are frequent and represent no information. A 
togram for this purpose examines groups of 10 con- 
rcutive extremes and if 


10k+9 
i 


10 & | Ci 


=10k 


(ane eae Ke Sl Dh ia (7) 
here T is the threshold, for three consecutive groups 
ree consecutive values of k), the extremes in the center 
roup are removed. The requirement that three con- 
-cutive groups be below threshold is included to preserve 
e beginnings and ends of low energy sounds which are 
ie the noise level. 

The selection of a threshold is a compromise between 
ipping low level sounds and reducing the number of 
tb ee, a compromise which depends strongly on the 
enal-to-noise ratio of the speech. The selections used will 
= discussed when the data are discussed. 


Tn addition to low level extremes, certain bumps may 
2 removed. For example, if the bump represented by 
-d; in Fig. 3 is small enough compared with the previous 


SPEECH WAVE — > 


TimMe,t —> 


Fig. 3—Plot to illustrate bump removal. 


ill aoa, then a, and a; may be eliminated with little 
ffect on speech quality. The essence of the bump remover 
rogram is to test 


Qj41 | <5: | Ghia = Ole | (8) 


here B is the bump fraction (0 < B < 1) and to remove 
he a;,, and a;,. extremes if the inequality is satisfied. 
‘he actual program has additional features which will 
ot be described in detail, so that in a sequence of sub- 
hreshold extremes (a, — ads in Fig. 3), the minimum or 
1aximum extreme (a,) will be preserved and the others 
rased (4; — as). Also, in an area of rapidly decreasing 
mplitude, large hills are not allowed to erase subsequent 
umps which are small only because they form the be- 
inning of a low amplitude section. 


| Qj+2 7 


). Distribution of Extremes and Channel Capacity 


During the simulation, the average rate of extremes, 
ne distribution of interval ¢;,, — t;, and the entropy of 
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this distribution were computed. The distributions (un- 
normalized) for most ‘of the values of threshold 7’ and 
bump fraction B which were computed are shown in 
Fig. 4. Additional distributions of somewhat different 
speech functions have been given by Davenport.° 

The distributions for the microphone samples are 
similar, all having a principal peak at 0.2 —0.3 msec and a 
subsidiary but significant peak at 1.0-1.1 msec. The hand- 
set distributions are also similar among themselves and 
have their main peak at 0.2-0.3 msec. Comparing directly 
the microphone and handset distributions, the latter have a 
stronger component at 0.1—0.2 msec. As a consequence, the 
average rate of handset extremes is substantially higher 
than that of the microphone extremes. The source of this 
effect 1s not yet known, and the only conclusion that can 
be drawn now is that carbon buttons in their present state 
are not desirable for extremal coders. 

From the distributions, an upper limit, H, on the 
average information content of the quantized t;., — ¢, 
intervals was estimated according to the relation 


H = 


SN WN 
Ss Wy, 108 i) +3 (9) 


7=1 T 


where NV, is the number of intervals whose duration lies 
between 7 — 1 and 7 sample times, and N77 is the total 
number of intervals. The estimate neglects any dependence 
between successive intervals. The additional 3 bits per 
interval is included in the relation because the interval 
is quantized to 1/8 sampling time. 

Some data on the joint distribution of a;.; — 4@, 
t:41 — t; which is not shown were also taken. The sample 
indicates that correlation between these two quantities is 
small, and little advantage can be taken of the correlation 
for coding. 

An upper limit on the channel capacity for the extremal 
data was computed by the equation, 


channel capacity 


= (Average Rate of Extremes): (log, Q + H) (10) 
where Q is the number of quantizing levels of a; and H is 
the average information content of the time interval dis- 
tribution. The channel capacity estimate assumes that 
while additional coding will be done on the time interval 
data to take advantage of their distribution function, no 
further coding of the a,’s will be done. This rather arbi- 
trary assumption was justified because the nonuniformly 
quantized a,’s have a nearly flat distribution. 

The computed data, samples 25 through 38, are 
summarized in Table I where the parameters of the 


6 W. B. Davenport, Jr., ‘Experimental study of speech-wave 
probability distributions,’ J. Acoust. Soc. Am., vol. 24, pp. 390- 
399; July, 1952. 
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TABLE I 
SuMMARY OF ExtTREeMAL Data 
Threshold 7 Inf. Content 
(Maximum of time Channel 
Quantizing speech Bump interval Bits Extremes per capacity Bits 
Sample No Source Type levels signal = 500) Fraction B per sample second per sec. 
eo 25 Microphone 32 4.5 3/32 6.84 1225 14500 
26 Handset 32 4.5 3/32 6.58 1774 20500 
27 Microphone 64 5.4 3/32 6.84 1215 15600 
28 Handset 64 4.5 3/32 6.53 1816 22700 
29 Microphone 128 4.5 3/32 6.83 1220 16900 
30 Handset 128 4.5 3/32 6.53 1830 24700 
31 Microphone 128 0.0 3/32 6.92 1572 21900 
32 Handset 128 0.0 3/32 6.55 1918 26000 
33 Microphone 128 9.0 3/32 6.92 983 13690 
34 Handset 128 9.0 3/32 6.49 1575 21200 
35 Microphone 128 4.5 0 6.76 1318 18130 
36 Microphone 128 4.5 3/16 6.94 1125 15700 
37 Handset 128 As 0 6.45 1984 26700 
38 Handset 128 4.5 3/16 6.66 1647 22500 
41 Microphone 64 4.5 3/16 6.94 1128 14600 
42 Handset 64 4.5 3/16 6.67 1650 20900 
44 Microphone 128 2.0 1/16 6.84 1480 20500 
44 Handset 128 4.0 1/16 6.52 1901 25700 
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Fig. 5—Channel capacity of extremal speech. 


arious conditions studied are listed along with the 
psulting average information content of extreme interval, 
verage rate of extremes, and channel capacity. (Also 
sted are samples 41 through 44, which will be discussed 
ater.) The channel capacity is plotted in Fig. 5. 

Prior to the subjective tests described in the next 
ection, certain qualitative judgments were formed by 
stening to the samples critically. Most of the listeners 
ad considerable previous listening experience and were 
amiliar with the general format of the coding scheme. 
1) As a function of quantizing levels, the quality 
improves substantially going from 32 to 64 levels 
but not much from 64 to 128 levels. Since the channel 
capacity increases uniformly with the logarithm of 
| number of quantizing levels, 64 levels appears to 
be the most efficient selection. 


2) The low level clipping became objectionable with 
a T of 4.5 in the microphone case and with a 7 of 
9.0 in the handset case. The absolute levels of 7 
depend so strongly on the background noise that 
they are not by themselves significant. A more 
meaningful figure is the reduction in channel capacity 
before objectionable clipping. The reduction is 
about 20 per cent in both cases. 
The bump threshold degradation was detectable 
only for B of 3/16; thus 15 per cent of the extremes 
representing minor bumps can be removed with 
almost no degradation. Such a removal is a sub- 
stantial modification of the waveform and the small 
amount of degradation so introduced is surprising. 
On the basis of these observations, four additional 
samples (41-44) were generated for more exact subjective 
evaluation. The first two samples were selected to have a 
low channel capacity with reasonable quality by setting 
Q@ = 64, T, 4.6 = and B = 3/16. The second pair were 
selected as a high channel capacity, high quality case with 
Q = 64, B = 1/16, and T = 2.0 (Microphone), T = 4.0 
(Telephone). The subjective comparisons with PCM 
speech are described below. 


3) 


E. Buffer Size 


The size of buffer required to receive the randomly 
occurring extremes and transmit extremal information 
at a constant rate is a complicated function of the statistics 
of the extremes. The function depends on such high order 
statistics that it was believed a better estimate of size 
could be obtained from a direct measurement for the 
samples at hand rather than from a statistical estimation. 
Consequently, an approximate measurement was made 
during the simulation. 

The number of samples B(t) in the buffer at any time 
may be written 


IRE TRANSACTIONS ON 
Bit) = E(t) — rt 
where H(¢) is total number of extremes occurring between 
times 0 and ¢ as illustrated in Fig. 6, and r is the rate of 
taking extremes from the buffer.’ The rate r must be 
close to the average rate of extremes 
__ KT) 

i 


where 7’ is the time over which the buffer is to be evaluated. 
The size of the buffer will then be the maximum minus 
the minimum contents: 


Buffer Size = BQ) waz — BO) wine 


This function was evaluated by the computer. 

Plots of buffer size as a function of low level threshold 
T and bump fraction B are given in Fig. 7. The size 
appears to be a random function of both these parameters 
and has no discernable trend. Such might be expected 
since the size is the difference of the absolute maxima and 
absolute minima of the difference between two functions, 
one of which fluctuates rapidly, as illustrated in Fig. 6. 
Therefore, a small change in H(t) may have great and un- 
predictable effects on the buffer size. 

The average size is about 1400 extremes. At an approxi- 
mate rate of 1400 extremes per second, the buffer will 
introduce a delay in the order of one second into the 
transmitted speech. Such a delay is several times the 
maximum tolerable delay in two-way communication, 
and would preclude such a use for the coding. The delay, 
however, could be greatly reduced in a multichannel 
system where the extremes of many speakers would com- 
bine into a more nearly uniform flow. 


SUBJECTIVE COMPARISON witH PCM 


A subjective comparison of the extremal speech with 
companded PCM speech was made to estimate a quality 
equivalence between these codings and, thus, to estimate 
the efficiency of the extremal coding relative to PCM. 
The comparison was designed to elicit a preference state- 
ment as a measure of quality. 

The test was carried out by dubbing 3 PCM samples 
and one extremal sample on 4 tracks of a 14-track tape 
recorder. Each sample contains the 4 previously described 
sentences (3 sentences in the case of telephone speech). 
The PCM samples were quantized respectively into 8, 
16 and 32 levels and sampled 10,000 times per second 
giving pulse rates of 30,000, 40,000 and 50,000 per second. 
The same digital source tape was used for the PCM and 
extremal codings so that all differences between samples 
resulted from the coding. The audio tape was formed into 
a loop so the samples repeat, the starting points of the 
samples being synchronized in all channels. 


7 With this definition, B(f) may be negative, indicating a nega- 
tive number of samples in the buffer. This situation is of course 
impossible. The actual buffer will contain (S(t) + Constant) 
samples. However, for this computation, the constant can be ne- 
glected as it does not affect the estimate of the buffer size. 
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Fig. 6—Plot showing buffer contents. 
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Fig. 7—Buffer size. 


In a test, the samples were presented to the subject 
in pairs, using the apparatus shown in Fig. 8. The experi- 
menter controlled which samples comprised a given pair, 
and the subject could listen to either member by means 
of a button which switched the output each time it was 
pushed. The subject was encouraged to listen as long as 
necessary, alternating channels as often as desired until 
he could answer the question: ““‘Which of the two samples 
do you like best or, in other words, which would you 
prefer to listen to on the telephone?” The samples were 
then changed until all possible pairs of the 4 signals had 
been presented twice to the subject (a total of 12 re- 
sponses). In each response, the preferred channel was 
ranked 1, the other 2. Five or six subjects were run for 
each test, the subjects being girls who were technically 
naive and completely unfamiliar with the coding. The 
final test results presented in Table II are the summation 
of the rankings for each coding. 

Four tests were run, two with microphone, two with 
handset speech. In the results, a low ranking indicates a 
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AN EVALUATION OF ExTREMAL CODING 


The conclusions which may be drawn from this simu- 
lation can be summarized as follows: 
1) Extremal codings can be considered for applications 


requiring digital transmission of speech. 


2) In comparison with companded PCM, the extremal 


coding requires about half the channel capacity of 
PCM for equivalent quality transmission. This 


nit rate microphone samples ranked between the 30,000 
snd 40,000 bit /sec PCM, the low being closest to the 30,000 
°?CM and the high being closest to the 40,000 PCM. 
Thus, a channel capacity reduction of about 2.2 over PCM 
s achieved by the low rate extremal and only about 1.9 
oy the high rate extremal. On the basis of this result, it 
vould seem that the inherent extremal degradation is 
such that highest efficiency occurs for low quality trans- 
mission at low bit rates. 

The handset samples, (nos. 42 and 44) by virtue of their 
nigher extreme rates, have markedly lower efficiencies 
shan the microphone samples, and lower quality compared 
with PCM. Thus, the carbon button should definitely be 
ivoided as an extremal source. 

_ The statistical significance of the paired comparisons 
ave been evaluated.” With the principal assumptions 
hat the ranking is carried out on a one dimensional 
»sychological scale and that successive rankings are 
ndependent, the probability that any of the observed 
jpreads of ranking would occur by chance is less than 0.04. 
Sonsequently, the data seems sufficiently reliable. In 
.ddition, in all except the test of sample 44, the order of 
preference among PCM samples is also the order of in- 
reasing channel capacity. This ordering lends additional 
rredence to the results. 

| The subjective comparisons indicate that a factor of 
bout two can be saved by extremal coding for speech 
ransmission, which is comparable to 30,000 bits per 
econd PCM. 


8 R. A. Bradley and M. E. Terry, ‘Rank analysis of incomplete 
lock designs,’ Biometrika, vol. 39, pts. 3 and 4; December, 1952. 


4) 


5) 


BUTTON ; : ‘ 
EXPERIMENTER factor is the maximum achieved over a range of pulse 
Fig. 8—Subjective test setup. rates and with a dynamic microphone source. 
TABLE II 
SuBsectivé Trst Resuits 
: 
Ranking 
. 30,000 40,000 50,000 
Extremal Number of Channel capacity Bit/sec Bit/sec Bit/sec Confidence 
Sample No. Subjects Speech Source Bits/sec. Extremal PCM* PCM* PCM* Rating 
41 6 Microphone 14600 59 62 49 46 .040 
43 6 Microphone 20500 54 68 53 41 004 
42 5 Handset 20900 54 49 40 OM) O11 
44 5 Handset 25700 45 57 38 40 004 
| * 10,000 samples per second 
oreferred sample. Both the low (no. 41) and high (no. 43) 3) The inherent degradation in the extreme inter- 


polation function is such that extremal coding is 
efficient only at low channel capacities. Hence, 
little improvement in quality is achieved by in- 
creasing the rate above 15,000 bits per second. This 
conclusion is reinforced by two simulations not 
reported in detail here. The first applied extremal 
coding to differentiated speech. The second used a 
more extensive interpolation rule which removed 
the discontinuities from the second derivative at the 
extremes. These modifications did not substantially 
improve the speech quality. 

In comparison with infinitely clipped speech, the 
extremal coding produces a better quality trans- 
mission using about three times the channel capacity 
(the amplitude information being additional in the 
extremal coding). The data obtained from the 
extremal simulation can also be used to obtain a 
better estimate of the information rate of the 
infinitely clipped speech. Thus, assuming a 100 usec 
time quantization, the ¢;., — 4%; distributions 
(Fig. 4) yield information rate estimates ranging 
from 4700 bits. per second to 6170 bits per second. 

A substantial buffer of about 1500 extremes or 
15,000 bits and associated transmission time delay 
of about one second is required for a single channel 
coder. This requirement contra-indicates the use of 
single channel extremal coders. However, by taking 
advantage of the more uniform statistics of a multi- 
channel system, by not requiring a completely 
uniform flow of information, and by tolerating a 
certain percentage of saturation, both the buffer 
and the time delay can be eliminated. 
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The extremal study was facilitated by digital simulation. 
Without this tool it would not have been possible to carry 
out the detailed comparisons with the precision described 

here. The extremely important questions of information 
rate and buffer size could not have been approached so 
directly, and of course, the cost and toil involved in- 
constructing an extensive laboratory model were avoided. 
Hence, it seems unlikely that the extremal study would - 


The extremal coding is an example of a class of processes 
which represent the speech time waveform as a sequence 
of simple features. Such codings achieve reductions in 
channel capacity because either a) the features occur less 
frequently than the Nyquist rate or b) a set of features are 
indistinguishable to the ear. The limit of channel capacity 
reduction achieved by the direct codings already tried 
is 3 to 4 and it is doubtful whether this limit will be much 


increased by future work. 


One side point should be mentioned in this summary. 


Correspondence 


On Periodicity of States in Linear 
Modular Sequential Circuits* 


A number of investigators have been 
concerned with sequential circuits com- 
prising ideal delays, modulo p summers, and 
amplifiers with gain equal to integers — < 
p = prime. It is convenient to call a device 
of this type a linear modular sequential 
circuit (LMSC) since it may be character- 
ized by the linear relations [1] 


y(n) = Cs(n) + Dx(n), 
s(n + 1) = As(n) + Bx(n), (1) 


where x(7), y(7), s(n) are input, output and 
state vectors, and A, B, C, and D are matri- 
ces; all are defined on the modular field 
GF(p); (7.e., the integers 0, 1, --: ,p — 1 
and the operations + and X modulo 7). 

The linear binary sequential circuits 
(p = 2) were extensively investigated by 
Zierler [2] and Huffman [3]. The latter 
also briefly considered ternary circuits. 
Hartmanis [4] considered the general case. 
Huffman demonstrated that a pair of 
binary circuits may be used as coding and 
decoding filters for binary error correcting 
codes [5]. In extending this method of 
coding to multiple level codes, some ques- 
tions concerning the periodicity of the 
states in an unexcited LMSC have arisen 
which have motivated this investigation. 
An unexcited LMSC is governed by the 
relation 


l 


s(n + 1) = As(n), (2) 


where A is an 7 X r matrix (r = order of 
the LMSC), called the characteristic matrix. 
The state period t is the smallest integer 
vy for which s(n + v) = s(n). Since, from 
(2), sm + v) = A’s(n), it is clear that the 
nullity of A‘ — I > 0. If t = T is such an 
integer that the nullity of A? — TI is equal 
to r(i.e.. A? = I), T is called the matrix 
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have been considered feasible without simulation, and 


period. The minimum polynomial m(x) of 
A is the polynomial of lowest degree k so 
that m(A) = 0. 

Lemma 1: (Fundamental Lemma) If 
m(x) is irreducible, then the set of poly- 
nomials in A of degree k — 1 over GF(p) 
form a Galois field GF(p*) of order k. 

Proof: Form the Galois field GF(p*) = 
GF[p, m(d)]; denote the elements of this 
field by 


P,Q) ae ay, ahs Gt 


+--+ far" aj} eGF(p). 


We exhibit the isomorphism P;(\)<> P(A), 
where P;(A) is a matrix polynomial in A 
with coefficients corresponding to those of 
P(X). It is obvious that P,(A) + P(A) = 
P(A) <— P.(A). Consider P,(A)P,(A). 
This product is a matric polynomial of de- 
gree 2(k — 1) which may be reduced, using 
m(A) = 0, to a polynomial of degree k — 1. 
The corresponding product P,(X)P,(A) has 
the identical coefficients since \ is a root 
in GF[p, m(\)] of m(\). Hence 


P,(A)P,(A) = P(A) @ P,Q) 


= P,QO)P.Q), 


completing the proof. 

Theorem 1: If m(x) is irreducible, the 
matrix period of A is equal to the order 
T of the subgroup generated by 2d in 
GF[p, m(d)], and every state period is 
equal to 7. (The zero state is excluded). 

Proof; Since Al <> xX, A» <> Xv. Then, 
if 7’ is the order of the subgroup generated 
by \ in GF[p, m(\)], 7 = 1, hence A? = J. 
Moreover, A” — I(v < T') may be reduced 
to a non-zero polynomial in A of degree 
k — 1 which, being an element of a Galois 
field, is nonisingular. This establishes the 
theorem. 

Corollary 1.1: The state and matrix 
period of a circuit whose minimum poly- 
nomial is irreducible and of degree k is a 
divisor of p* — 1. 


simulation demonstrated its effectiveness in this study. 


Corollary 1.2: There exists an LMSC for 
every modulus p and every order k& so that — 
the state and matrix period are of maximum — 
length p* — 1. 

(The problem of finding a LMSC with 
maximum-length period entails the deter-— 
mination of a polynomial m(A) so that 
is a primitive member of GF[p, m(\)]). 

Example 1. Consider the LMSC with a_ 


characteristic matrix 
‘ 


a=|’ > [mo 
Dy 


The minimum polynomial of A is m(z) =~ 
x? + 1 which is irreducible in GF(3). Hence 
A = 2d in GF[B, 2 + 1]. It can be shown — 
that is 4th order in GF[3, 2 + 1], hence 
PA. 

If the minimum polynomial of A is reduc- 
ible [over GF(p)], the set of polynomials in A 
of degree k — 1 do not comprise a Galois 
field. It is not possible to assert that A7 — I 
is either 0 or nonsingular; consequently, 
states lie in sequences of various lengths. 
It is clear, however, that every state period 
t is a divisor of the matrix period 7, and_ 
hence if 7 = prime, ¢ = 1 or T. To investi- 
gate the periodicity of states in LMSC’s 
which have reducible minimum _poly- 
nomials we establish: 

Lemma 2: If A is similar to B = the 
direct sum of matrices B; (¢ = 1, 2, --- , s) 
each with period R;, then the period of 
FAME Mca (Ofinn iy 220 5 IEA))- 

Proof: There exists a (nonsingular) 
matrix M so that A = MBM— where 
B = diag [B,, Bo, -:: , Bs] (the direct 
sum). Now A? = MBTM~—, hence, AT = FT 


if and only if B? = IJ. Now BY = diag 
[By, By, --* 5 Bl. Rhus) B= ef ateane 
only if Bey = I @ = 1, 2, --- , s): The 


smallest integer v for which this occurs is 
vy = T = lem. (Ri, Ro, -:: , Rs), which 
establishes the lemma. 

Theorem 2: If the factors m.(x) (¢ = 
1, 2, --- , s) (over GF(p)) of the minimum 
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olynomial are distinct, and 7; = order of 
ibgroup generated by » in GF[p, m,(A)], 


hen one period of A is T = liem. (1%, 
» Ts). 
Proof. Under the conditions of the 


heorem, each matrix B; of Lemma 2 may 
e chosen to satisfy m:(B;) 0. From 
‘heorem 1, the period of each B; equals 
he order of the subgroup generated by 
in GF[p, m:()]. Hence, by Lemma 1, 
he theorem is established. 

It can easily be shown that the di- 
vensionality of the subspace of states 
aving t = 7; is equal to the rank of the 
ratrices B; in Theorem 2. 


Example 2. Consider the LMSC whose 
haracteristic matrix is given by 
re m8 
] 0 
i 
Ls er oon _| mod 2 

(Ole! 

() LeeOr &() 
{ 
LL Oth Oa 


Tere p=2 and & = 5 and there are 2° = 32 
tates in the state space. The characteristic 
nd minimal polynomial is 


nz) =( fet 


: -(x*? + 2 + 1) mod 2 


nd each factor is irreducible in GF(2). We 
1ust thus examine the period of the sub- 
roup generated by X in both GF|2, 2 + 
— 1) and GF[2, 3 + > + 1). It is found 
hat in each group 2 is primitive; hence, 
he corresponding bce are 4 = 2? — 
= 3 and ft. = 28 — 1 = 7. Using Theorem 
, the matrix period is 


Pec In Nod |= od 


ind every state period is a divisor of 21. 
m particular, there is a 2-dimensional 
ubspace of states (3 states) having a state 
period of 3, resulting in one cycle of length 
, and a 3-dimensional subspace of states 
7 states) having a state period of 7, re- 
ulting in one cycle of length 7. The 
epmaining 21 states lie in a single cycle 
f length 21. 

The case in which there are repeated 
hetors in the minimum polynomial of A 
s governed by: 

Theorem 3: If the minimum polynomial 


h(c) = W9_, m(z)e*, and 7; = order of 
ubgroup generated by » in GF[p, m.(\)], 
en 

‘T — Reamer Gils, Klos Bd Sir?) kT.) 


F = integer > 1. 
Proof: Each matrix B; of Lemma 2 may 
\e written in the rational canonical form [7] 


ocala 0 0 
ee Cane 0 0 
oO ® Creu 

On 20 One 


Correspondence 


where C; is a matrix having m;(x) as its 
minimum polynomial; U is a matrix whose 
elements are all zero except the lower left 
hand element which is 1. (The number of 
rows of B; < e;.) Now 


[Ci Mis Ni. 
Be = Opec; No. 
En 02 aC 


(where NV; is a function of C; and U). 
Therefore, a necessary condition for By” = I 
is C;” = I; hence, R; = k:T;. 

In summary, these results represent a 
fairly straightforward application of Galois 
field theory to a specific type of sequential 
circuit. The writers believe that other 
problems in discrete systems and coding 
may be treated by the use of the theory 
of finite fields. 


BERNARD FRIEDLAND 
Tuomas E. STERN 
Dept. of Elec. Eng. 
Columbia University 
New York, N. Y, 
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“Application of 


Poincaré, Metric Reliability and 
Switching Components* 


More than fifty years ago the French 
mathematician, Jules Henri Poincaré! con- 
trasted what he called the mathematical 
continuum and the physical continuum. 
It was his impression that only the mathe- 
matical continuum had a transitive rela- 
tion? of equality and that the physical 
continuum did not have such a transitive 
relation of equality. 

This brief note serves to quantify Poin- 
caré’s notions of continuums in the context 
of probability theory, metric spaces, and the 
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unifying branch of engineering mathe- 
matics called stochastic switching cir- 
cuits.3~® 


1) Consider the set a of states of some 
switching component (the cardinality® of 
a is not to exceed aleph-null). Then, a is 
said to be the state-space of the component. 
Identify corresponding states of excitation 
and response of the switching component 
by the same positive integer. If A ea is 
a steady-state of excitation of the com- 
ponent, B e aw is the steady-state of response 
of the component resulting from the state 
A of excitation, and Ag is the set of response 
states of the component different from B, 
then let P(A, b), b e Ag be the probability 
that b corresponds to A. 

li Spex n PC, 0) exists, call iteeeCA Bb) 
The function P, is said to be the component 
probability of error. We may also think of 
P, as the probability of being able to dif- 
ferentiate between A and B in the steady- 
state and with respect to the identification 
of excitation and response states given 
previously. 


2) Based on this motivation we make 
the following postulate: 

(a) P(A, B) = 1) if and only if Ay = 5: 
for symmetry we postulate 

(b) P(A, lay) ben((8y Ze), Aes a, Be a; 
and wishing to reduce the effects of transi- 
tivity, we postulate 

(¢)) 22 C4Ay By) eBaG@) 
Aea,Bea,Cea. 


< PA, C), 


3) Put r(A, B) = —In P.(A, B) and call 
r the component reliability function. The 
intuitive significance of r is apparent. As 
P.—0thenr — + and as P, — 1 then 
r — 0. In addition, r has the usual entropy 
connotations. From section 2 of this note, 
it is a simple matter to show that 7 is a 
Fréchet metric’? on the state-space of the 
component and, hence,’ that , = r/(1 + 7) 
is a bounded metric on the state-space of 
the component. 


4) Further mathematical significance can 
be given to the previous discussion. Thus, 
there is a context in which it is possible 


“to discuss the reliability corresponding to 


a metrization of a curve in some space. 
While working on the material of this 
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Optimal Properties in the Statistical 
Theory of Reception* 


Many authors have discussed various 
applications of the Bayes’ inverse prob- 
ability theorem to the problem of optimum 
receiver design. These fall roughly into 
two not completely mutually exclusive 
classes: those which might be classed as 
pure ‘‘detectors’”’ and are expected.to make 
decisions as to the presence or absence of 
a signal; and those which are expected to 
estimate the best value of a continuous 
parameter or to continuously demodulate 
a continuously modulated signal. 

Siegert’s “Ideal Observer’’! for the detec- 
tion of radar echoes is an example of the 
first. The ‘Ideal Observer’’ may be viewed 
as one which makes successive decisions, 
in favor of that possibility which cor- 
responds to a maximum a posteriori prob- 
ability; this decision method minimizes the 
average rate of making mistakes. Modifica- 
tions of this type of detector have been 
extensively discussed by Middleton? and 
many others. In all these instances, the 
detector may be said to be optimum in 
some sense, provided the required statistical 
a priori signal information is correct. 
Woodward’ and Davies, DuWaldt‘ and 
others have discussed the application of 
the inverse probability theorem to the 
radar location problem, although the former 
point out that, at least in the military 
radar problem, a systematic scheme for 
choosing a “‘best’’ estimate of position is 
rather hard to achieve due to the usual 
paucity of a priort knowledge concerning 
the possibility of a target. Their “‘best’’ 
output, if possible, is the complete a 
posterior’ distribution function. Choosing 
the maximum of this distribution for a 
“best’’ estimate is an obvious possibility, 
but they give no justification, in terms of 
the quality of results obtained, for such a 
choice. Lacking statistical a prior? signal 
information, the most popular next choice 
is to use the Maximum-Likelihood Method*-5 
but then no optimal properties can be 
guaranteed except, in some cases, minimax- 
type properties. The most ambitious 
attempt at using maximum a_ posterior? 


probability as a basis for receiver design ° 


(and for a situation in which the a priori 
signal statistics are largely under the con- 
trol of the system designer) is probably 
carried out by Youla;* but again, no optimal 
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property related to the quality of results 
to be obtained has been demonstrated. 

It is the purpose of this note to demon- 
strate such an optimal property for a 
restricted class of receivers of the Youla 
type; namely, that the probability density 
function of the resulting linear error after 
demodulation is maximized for zero error. 
The meaning of this statement should 
become evident from the analytical dis- 
cussion which follows below. 

Let M and N be independent random 
variables representing ‘message’? and 
‘noise’’ respectively, with probability func- 
tions Py(M) and Py(N). We assume these 
functions to be differentiable. Let R = 
F(M, N) be a given function of M@ and N 
which represents the input to the receiver. 

We will assume that (0F(M, N)/dN) is 
one-sided, and for simplicity we take 
(oF(M, N)/dN). The inverse transforma- 
tion, V = 4(M, R), along with (0?4/aM oR) 


are assumed to exist. From N = #(M, R) 
we have 
| _ ab ar 
Ok ON ’ 


so that (06(M, R)/dR) > 0. First, we ask 
the following question: If R is given, what 
is the conditional probability that J lies 
in the interval (VM, M + dM)? Let A be 
the event signifying that M lies in the 
interval (MV, M + dM), and let B denote 
the event signifying that # lies in the inter- 
val (R, R + dR). We are interested in 
obtaining P(A/B). From Bayes’ theorem 
we have 


Py(M) dMP(B/A) (1) 
pelt) dk 


where P(B/A) is the conditional probability 
that R lies in the interval (Rk, R + dR), 
given M. 

Now the probability that R exceeds Ro, 
given M, is 


ech ie) — 


P[F(M, N) > R, | M] 


P(N > #(M, R,)] 


i Py(N) dN (2) 
®(M,Ro) 


so that 

P(B/A) 

eos 

a i ok @(M,R) Bath) a | at 
Pye, A) PED ar. a 


Thus (1) can be written 


P(A/B) = Pu(M)Py{®(M, B)] 
OLD [pe 


September 


If one chooses that estimate of M, for 
a given R, which maximizes (4), then the 
estimate of M is obtained by solving 


0 ; 
a {PWM Pyle Ki) 


ab(M, R)\ _ 
aR bao ©) 


for M in terms of R, for example, MV = f(2). 
Eq. (5) is the known maximum-likelihood 
method for estimating the message, 17. 

Now let us consider the following prob- 
lem: Suppose that an estimate of M, the | 
receiver output signal, is to be of the form 

= g(R) = g(F(M, N)) for any given R. 
We assume that the inverse transformation — 

= ¢(S) exists and is differentiable. The 
linear error in determining M is given by 
e = S — M, and this error will have a 
probability function given by P(e). 
Every choice, g(R), will yield a value of 
P.(0). We inquire as to the choice of g(R) 
such that P.(0) shall be a maximum among 
all possible choices of the receiver function 
g(R). We will show that the maximum — 
a posteriori probability method discussed 
above will give just this result. Now, 


P[F(M, N) > Ro] 


= i i Py(M)Px(N) dN dM 
o JN=6(M,Ro) 
(6) 

so that 
P,(R) = [Pal M)Pyl®ML, R)] 

d0(M, R) 

ok 
Let A denote the probability of the event 
(M, M + dM), and let C denote the prob- 
ability of the event (S, S + dS), with 
= g(h). 


From 
P(S > S| M) 

= PIR > ¢(S,) | ] 

= P[F(M, N) > ¢(S») | M] 
P{N > ®[M, ¢(8,)]} 


dM. (7) 


I 


/ i Py(N) dN (8) 
®{M,¢(So)] 

we note that 

P(C/A) = Py{®[M, ¢(S)]} 


d&[M, 6(S)] 
dg 
so that Bayes’ theorem yields 


P(A/C) = Px(M)Px{®[M, 6(S)]} 


M 
oe dM iy. P3(8). (10) 


e(S) dS (9) 
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irom 


PS — M > 6) 


i= [ : I ca SPCA/O) aS dM 
(11) 


t follows that 
? .(€) 
= | PxO)Px (OU, 6 + 0} 


99[M, o(e + M)] 
dg 


¢'(e + M) dM 
P (0) 
Ef : Pu(M)Px{®[M, 6(D]} 


28M, o(M)] 
ag 


We wish to determine ¢() which maxi- 
mizes P.(0). One notes that the substitution 
= ¢(M) reduces the integrand of (12) 
-o just that expression which is extremalized 
m (5), so that P.(0) is indeed a maximum 
or M = f(R) as given by (5). 

The curves in Fig. 1 illustrate our 
esults. The curve (a) typifies a probability 
tensity function for the linear error for 
vhich the maximum-likelihood method is 
ised for estimating the message, while 
surve (b) represents a p.d.f. for the linear 
error for an arbitrary continuous rule for 


jetermining the unknown message. 
| 


¢’(M) dM. (12) 


P(e) 


Fig. 1 


_ An alternative proof of our result as 

suggested by the reviewer proceeds as 

‘ollows. Let the random variables x and y 

denote ‘‘message’’ and “received signal,” 

respectively. This communication process is 
characterized by three frequency functions, 

r(x), gly), and h(x/y), such that 

EL) IA f(a) de = the a priori prob- 
ability that the mes- 
sage xz is in A; 

2) fx» gly) dy =the probability of 
observing the received 
signal y in the set 

| B; and 

8) fa h(x/y) dx = the a posteriori prob- 
ability that the re- 
ceived signal y orig- 
inated from a message 
x belonging to the 
set A. 
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Of necessity, 


Ha) = [ oadh(o/y) ay. 


For any fixed y = yo we postulate the 
existence of a unique finite zo such that 


h(Xo/Yo) 2 h(x/Yo), 
169 << LCD; 


(13) 


(14) 


This correspondence yo — xo determines a 
single-valued function z = m(y) such that 
m(y) is the a posterior? maximum-likelihood 
estimate of x, given y. 

Let « = n(y) denote any arbitrary con- 
tinuous receiver rule for determining the 
message x. From the definition of m(y) it 
follows that 


h(m(y)/y) 2 h(nty)/y). (15) 


P.(0) and P.*(0) as given by (12) for 
the maximum-likelihood estimate of x and 
the arbitrary receiver rule for determining 
x can be expressed by 


PAO) = [a hlm(a)/a) dy; 
= (16) 


P40) = | gwhn)/a) ay 


As a consequence of (15) it follows that 
PAOlpeeP 0), C)e ID, 


Harry Lass 

Jet Propulsion Lab.-CIT 
Pasadena, Calif. 
Rosert M. Stewart 
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Two Notes on a Markoff Envelope 
Process* 


In a recent paper! Pierce has shown that 
the envelope of the output of a high-Q 
singly tuned RLC filter is a first-order 
Markoff process when the filter input is 
white Gaussian noise. A simple proof of 
this follows from the fact that the transi- 
tion probability density function (p.d.f.) 


Pirsilir 0) f 2 
testa 
Pe ane ( ute 
exp | (1 ae ] Sy)” (1) 
w=e’, 
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pp. 163-166; December, 1958. 
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satisfies a Fokker-Planck equation? 


aP s) ie 
aes er [A(r)P] + 4 ae [B(r)P] (2) 
with 

Were , Sc a EO) 


P(ro | r, t) dr is the probability that the 
envelope has a value between r and r + dr 
at time t, given that r = ro at t = O. (Here 
we have chosen units so that, in Pierce’s 
money, yy = I fy = 3 oe = A ip S 1) 
The Fokker-Planck differential equation 
defines a first-order Markoff process and 
provides assurance that the transition 
p.d.f. P(ro | 7, t) obeys the Smoluchowski 
equation.? 

(2) and (3) can be derived from the fact 
that the in-phase and quadrature com- 
ponents x(t) and y(t) of the output of the 
high-Q filter are independent first-order 
Gaussian Markoff processes with transi- 
tion p.d.f.’s obeying? (2) with A(r) = —7, 
B(r) = 2. If we consider the diffusion of 
a particle in the XY-plane, starting at 
(%o, Yo), its density function p(x, y, t) at 
time ¢ must satisfy the equation 


i a a 
an? + ay? qe (ap) + By (yp) 


TF Le, YY 


which is obtained by combining the above- 
mentioned Fokker-Planck equations for the 
processes x(t) and y(t). Changing to polar 


co-ordinates x = 7 cos 0, y = 7 sin @ in 
(4), we get 

1@ zy 1 0° 

r or (2 sr r Oe 


a Eee 


The envelope process 7(¢) is just the radius 
vector to the diffusing particle. Since the 
initial phase 4 is uniformly distributed in 
0 < 6) < 2m, the density function P(7» | 7, ¢) 
for the envelope is independent of the 
angle @. This function represents the density 
of particles in an annulus of area 2ardr 
at radius r in the XY-plane, and must 
therefore be 2zar times the function p 
appearing in (5). Making the substitution 
P(ro| 1, t) = 2arp(a, y, t) in (5), we obtain 
(2), (3) immediately. 

The Fokker-Planck equation (2) can be 
used to find the p.d.f. Q(ro | a, t) of the 
first time ¢ that the envelope r(t) crosses 


2M. C. Wang and G. E. Uhlenbeck, “On the 
theory of the Brownian motion II,’”’ Rev. Mod. Phys., 
vol. 17, pp. 323-342; April, July, 1945. (Reprinted 


in N. Wax, ‘‘Noise and Stochastic Processes,” 


Dover Publications, Inc., New York, N. Y., pp. 
113-132; 1954. : 
3G. E. Uhblenbeck and L. S. Ornstein, ‘‘On the 


theory of the Brownian motion,” Phys. Rev., vol. 
36, pp. 823-841; Sept. 1, 1930, (Reprinted in N. 
Wax, op. cit., pp. 93-111.) 
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a threshold level r = a, given that it had 
the value 7) at ¢ = 0. This first-passage-time 
problem can be solved by the methods of 
Siegert, for which we need the Laplace 


transform 


Peat) e = [ Pe |r, ) dé (6) 


v7) 


of the transition p.d.f. It can be found 
by solving the Laplace-transformed Fokker- 
Planck equation (2) with appropriate 
boundary conditions, and it turns out to be 


Px(To r) 
= 5 T(A/2)e°"*/H(r/2, 1; 73/2) 


aD N21 2), a eae oe 


= 5 T(d/2)e7"*/2 (4/2, 1; 72/2) 


W(A/2, Ly T/2), > To) (7) 


where © and W are the confluent hyper- 
geometric functions as defined by Erdélyi 
et al.6 According to Siegert,4 the Laplace 
transform Q)(7o| @) of the first-passage-time 
p-d.f. Q(7o | a, t) is then 


B(A/2, 15 70/2) | (8) 
BN/2) 1 a2) 
It must be inverted to find the p.d.f. itself. 


By the usual techniques one can obtain a 
series of the form 


OG. date ne 


k 


ONG | a) = 


(9) 


where the A, are the residues of Q)(7 | a) 
at the roots A, of 


BOG) 2 a a 2) 08 (10) 


These roots can be calculated approxi- 
mately® by interpolating between the zeros 
of the Laguerre polynomials L,,(x), since 


&(—m, 1; 2) = Tim + YL,@), 


x =a/2, m = positive integer. 


Those zeros have been tabulated by Salzer 
and Zucker.7 Some curves of the first- 
passage-time p.d.f. for particular values of 
a and for an envelope r initially Rayleigh 
distributed in 0 < ro < a are given in the 
report cited.® 


4A. J, F. Siegert, ‘‘On the first passage time 
probability function,’’ Phys. Rev., vol. 81, pp. 617- 
623; February 15, 1951. 

5 C, W. Helstrom, “The accuracy of probability 
distributions computed by the sampling approxi- 
mation,’’ Westinghouse Res. Rep. No. 8-1259-R2; 


May 21, 1956. : 
“Higher Transcendental 


6A, Erdélyi et al., 
Functions,’”’ McGraw-Hill Book Co., Ine., New 
“Tables of the 


York, N. Y., 1953. See ch. 6, pp. 248-295. 
7H. E. Salzer and R. Zucker, 

zeros and weight factors of the first fifteen Laguerre 

polynomials,’’ Bull. Am. Math, Soc., vol. 55, pp. 

1004-1012; October, 1949. (Reprinted in “‘Tables of 

Functions and of Zeros of Functions,’’ Nat’l Bur. 


of Standards, Appl. Math. Series No. 37, Washington, 
D. C., pp. 191-199; 1954.) 
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In particular one can easily calculate the 
mean first-passage-time ~ (ry | a) for an 
envelope starting at 7) and crossing a level 
a > ro, by the formula® 


Ae a) Re 


On Qx(ro | a) l,=0 


I |) 2 
5 |2 DASE ey D) 


d 2 
= BN B(r, i: 2) | 


Gaye =. 
= al ota ds 


E 32/2 8 


a 


A=0 


= 3[Hi(a’/2) — Bi(rs/2) 
ER oit (11) 


where the function Hi(x) is tabulated in 
Jahnke-Emde.* The above expression is 
most easily derived by writing out the 
hypergeometric series for each term, dif- 
ferentiating it with respect to \, and setting 
» equal to zero. The resulting series is 
then the same as is obtained when the 
above integrand is expanded in a power 
series in s and integrated term by term. 


Cari W. HreLtstrom 
Dept. of Math. 
Westinghouse Res. Lab. 
Pittsburgh, Pa. 


8, Jahnke and F. Emde, Tables of Functions, 
Dover Publications, Inc., New York, N. Y., pp. 
6-8; 1945. 


In a recent article by Pierce,! a special 
case for a Markoff envelope process was 
discussed. While the referenced paper 
would serve admirably as a tutorial ex- 
position of the manipulative mathematics 
associated with the elementary calculus 
of probability theory, the conclusions 
reached by Pierce can be arrived at in a 
much more direct and simple fashion. In 
fact, it is possible to draw more general 
conclusions regarding the order properties 
of envelope processes. 

The effects of a linear network whose 
transfer function has a finite number of 
poles on the dimensionality of a random 
process have been discussed by several 
authors.?° It was discussed in a particularly 
straightforward and concise way by Gold 
and Young,? who demonstrated that a 
linear network whose transfer character- 
istics are determined by a constant coeffi- 
cient linear differential equation of order 
N, introduces an added dimensionality of 
N to any statistical process which is passed 


9B. Gold and G. O. Young, ‘‘The response of 
linear systems to non-Gaussian noise,’’ IRE Trans. 
on InrorRMATION THEORY, vol. IT-3, pp. 63-67; 
March 1954. 

10 J. Heilfron, “‘On the response of a certain class 
of systems to random inputs,’ IRE Trans. on 
INFORMATION THEORY, vol. IT-1, pp. 59-61; March, 
1955. 
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through it. This dimensionality property 
is not restricted to Gaussian processes; 
Heilfron,* in fact, applies this property to 
cases where general, zero-memory non- 
linearities exist. 

The linear network which Pierce! has 
selected possesses two poles (a conjugate 
pair). Since the envelope detector is a 
zero memory device, the dimensionality of 
the process at its output cannot, by defini- 
tion, exceed the dimensionality of the 
process at its input. Hence, at most, the 
statistics of the envelope detector output 
are specified by a third order process or 
second order Markoff process. Pierce’s 
assumption of a network having, as a 
close approximation, ‘narrow band” sym- 
metry about a nominal center frequency 
permits, in the manner of Rice," the res- 
olution of the Gaussian noise into in-phase 
and quadrature components about the 
assumed center frequency. As is well known, 
these components also have Gaussian 
distributions and, within the validity that 
symmetry exists, are independent processes. 
As a result of the assumed narrow-band 
symmetry, the equivalent network transfer 
function effective with each component 
possesses but one simple pole. This is all 
tacitly present in Pierce’s development, 
though he did not explicitly point out this 
simple but significant property. Thus, for 
an input of white Gaussian noise, both the 


in-phase and quadrature components are — 


defined respectively by independent first- 
order Markoff processes. Demonstration 
that the envelope process also becomes a 
first-order Markoff process then becomes 
trivial. 

These conclusions can easily be extended 
to a more general case. If the predetection 
linear network is such that its transfer 
function is rational, having 2N poles, 
then, at most, the envelope function is a 
vector of a (2N)th order Markoff process. 
In particular, if a specification of narrow- 
band symmetry may be validly applied, 
the resulting envelope function becomes a 
vector of an Nth-order Markoff process. 


C. T. Istey 
Communications Div. 
Hughes Aircraft Co. 
Los Angeles, Calif. 


JS. O. Rice, “Mathematical analysis of random 
noise,” Bell Sys. Tech. J., vol. 24, pp. 75-79; January, 
1945. 


A Note on Angle Modulation by a 
Mixture of a Periodic Function 
and Noise* 


This paper examines some properties of 
the waveform 


* Received by the PGIT, April 13, 1959. 
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Y(t) = cos [wt + P(t) + N(O]. 


This expression represents angle-modulation 
of a carrier w by a mixture of a periodic 
function P(t) and a noise function N(t). 
A result formerly stated, to the effect that 
under certain conditions Y(t) can be ap- 
proximately represented as a sum of wave- 
forms, each of which is the result of N(t) 
modulating one of the sinusoidal com- 
ponents of cos [wf + P(t)], is reviewed. 
This approximate result is supplemented by 
exact statements concerning the correlation 
function of Y(t); one of these statements 
is, that under certain easily satisfied con- 
ditions, the correlation function of Y(¢) is 
that of cos [wt + P(t)] multiplied by a 
factor depending only on the properties of 
W@). 


INTRODUCTION 


This note deals with the following 
question: if one is attempting to produce 
a waveform having a line spectrum, by 
angle-modulating a carrier by a_ periodic 
function, to what extent will the unavoid- 
able addition of noise to the nominally- 
periodic modulating function cause the 
resulting waveform to lose its line spectrum 
character? Some of this noise may be con- 
sidered as deriving from the carrier fluct- 
uations. In a mathematical sense it is, 
of course, true that any amount of a non- 
periodic function or noise will destroy the 
line spectrum. However, the nontrivial 
problem is really the speed at which spaces 
between the lines in the spectrum fill in, 
as the power or other properties of the 
noise change in a manner relative to those 
of the periodic modulating function or the 
earrier. A corresponding question may be 
asked concerning the autocorrelation func- 
tion. 


AN APPROXIMATE RELATION IN THE 
Time Domain 


A result of some interest in this con- 
nection is presented as relation (14) below. 
Let 


y() = cos [wt + P()] = cos [6()] (1) 


represent the phase modulation of a carrier 
‘w by a periodic function P(t) having a 
basic period 7 = I/a. It can easily be 
shown that y(é) has a trigonometric 
series expansion of the form, 


>> d, cos [( + 2rna)t + B,], 
é (2) 


which represents a line spectrum. 
If we add a noise function N(¢)! to the 
modulating function P(t) we have, 


| Y(@ = cos [wot + PO) + NO] 
= cos [¢(t) + N(O)] (3) 


representing the resulting waveform Y(t). 


] 

| 

| 

| 
° 

1 Throughout this paper it is assumed that the 


random function N(t) has appropriate properties of 
stationarity and ergodicity. 


Correspondence 
Go through the following: 
Y(t) = cos ¢(t) cos N(t) 
— sin g(t) sin V(t), (4) 


sin ¢(t) = cos (sc — =) 


+ Pe — P(e 2) | (6) 


Tv 


= cos Ee = Z| + Pt) —P 


(3) g 


If on the average, P(t) — P(t — a/2w) is 
very small (that is, if it is a small part of 
a radian), or if 


2w) 
then (7) can be replaced by 


sin ¢(t) & cos EC ry x) | 


po —P(t- £) «1, (8) 


The restriction on P(t) — P(t — w/2w) 
means that the modulating phase changes 
very little during the time 7/2w, a quarter 
of a carrier cycle. Clearly, this is a case of 
practical interest. With (8), then, we use 
(2) and write 


H(t — x.) = >> d, cos | + 27na) 


(:-£) +2] 
(10) 


= es d,, COS E + 2rnat 


= >> d, sin E + 2rnat 


n 


ere Zaman | 


2 (11) 
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If the quantities (27vam/2w) are (for any 
values of » which are of interest) small 
parts of a radian, or if 
(2rnar/2w) K I, (a) 
(in other words if the effective bandwidth 
of the waveform (1) is small compared to 
the carrier frequency), we may write from 


(11) 
sin g(t) & il — =) 


ee > d, [sin wt + 2rnat + B,]. (13) 


n 


The latter approximation is not con- 
sistent with the former. 
Inserting (13) in (4) we have 


Y(t) ~ cos N(t) >> d, 
-cos [wt + 2rnat + B,] — sin N(t) 
. ys d,, sin [wt + 2anat + B,] 


~~ > d, cos [wt + 2rnat 


Taba NONE (14) 
The last result may be stated in the 
following manner: phase modulation with 
N(t) of the complex waveform (1) or (2) 
has the same effect as if N(t) were used 
to modulate each of the components 
of the waveform (2). Note that we have 
used the special conditions (8) and (12) to 
establish the approximate equality (14).? 


THE CORRELATION FUNCTION 


The approximate result (14) refers to 
the time function Y(t). We now derive 
some exact results which refer to the auto- 
correlation function of Y(t), Ry(7). We con- 
sider (3) for Y(t). Since Y(t) has a non- 
random part in the argument of the cosine, 
the statistical (ensemble) properties of 
the product Y(t)Y(¢ — 7) depend upon t. 
However, we can still get a useful correlation 
function by averaging over ¢. For our 
purposes, therefore, we shall define Ry(7) as 


Biya = (1) Va) 


where the bar denotes time average and 
the brackets denote ensemble average. 
This corresponds to the common experi- 
mental situation in which (in one idealized 
model) one takes a time average of a cor- 
relation product and then proceeds to con- 
sider the ensemble average or expected 
value. 


(15) 


2N. D. Blackman, in Tech. Mem. EDL-M43 
of the Electronic Def. Lab., April 20, 1955, gives 
the result (14) but invokes only the aid of (12). 
It seems to the writer that (8) must also be assumed. 
Eq. (12) does not imply (8); of course, neither of the 
treatments tells us how accurate (14) is. 
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Consider, then, the time average 
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cos [¢(t) + N(é] cos [N(t — 7) + M(t — 7] 


Nie 


+ 3 


The last line may be replaced by 


Upon taking ensemble averages, (17) will be 
2(cos [N(t) + N(é — 7) 
cos [p(t) + o(t + 7)] 
(sin [N(@) + NG@ — 2))) 


eos [(t) — o(t — 7) + M(t) — M(t — 7] 
cos [¢(f) + ot — 7) + M() + NG — 7)). (16) 
3 cos. [N(é) + N(é — 7)] cos [o(é) + o(¢ — 7)] 
—zsin [N(é) + N(é — 7)] sin [6(d) + o(t + 7)]. (17) 
where 
Cx(z) = (cos [N@) — N=) 1), 
Sy(7) = (sin [V(t) — M(t — 7))), 
-Cy(7) = 3 cos [¢(t) — o(t — 7))), 
(18) Sy(z7) = gsin [¢(t) — o(t — 7)]. (24) 


sin [$(t) + $(¢ + 7)]. 


From (1), ¢(¢) is of the form wt + P(t). 
Thus, the time averages appearing in (18) 
of the form 


cos [2wt — wr + P(t) — P(t — 7)], 


(19a) 
and 
sin [2wt — wr + P(t) — Pit — 7]. 
(19b) 


Now P(t) — P(t — 7) has the same period 
as does P(t), namely, 7’. It can be proven’ 
that (19a) and (19b) approach zero if the 
average is taken during a sufficiently long 
time compared with 7 and if 


2rn 


T# ; (20) 
w 


where n is an integer. In this sense, then, 
(19a) and (19b) and (17) may be replaced 
by zero, and the ensemble average of (16) 
becomes 


Ry(r) = 3(cos [p(t) — o(¢ — 7) 
te VCO) eo Nie 7) |) 21) 


Expanding (21), we have for our correla- 
tion function, 


Ry(r) = (cos [N() — N(t — 7) 
-cos [¢(t) — o(¢ — 7)] 
= Hsin [N(t) — N(t — 7) 
sin [6() — of — 7)] (22) 
Res) = (YYE — 7) 
= Cy(7)Cy(7) — Sx(z)Sr(z), (28) 


2 To be demonstrated in the appendix. 


It may be of some interest to write (23) 
in another form. Define the complex cor- 
relation coefficients 


ty(7) = (fv(Ofxt — 7)) (25) 
and 
ly T) = HOPE — 7) 
where 
f2@) oe go (26) 
and 
LOSI. 
Then 


Ry(r7) = Real part of [ry(7)r,(7)]. (27) 


Returning to the basic results (22) or (23), 
we point out that if M(t) were identically 
zero for all (¢), (22) would reduce to 


2 cos [¢(t) — o(t — 7)] = C,(7). (28) 
Therefore, (23) must be the same as 
cos ¢(t) cos @(t — 7); (28a) 


z.e., the correlation function of y(t) = 
cos [¢(t)]. To show this directly, we may re- 
place (28a) by 


4 cos [o(t) — o(t — 7)] 
+ 3 cos [o(t) + o(t — 7)], 


whose second term is one-half of (19a) 
or zero, which demonstrates the equality 
mentioned. 

The result (22) or (23) may now be ex- 
pressed in the following way: when N(t) 
is added to ¢(¢), the correlation function 
Ry(7) of the resulting noise-modulated 
function Y(t) = cos [¢(t) + N(t)], is the 
same as that of cos ¢(t), namely, C,(r), 
multiplied by. Cy(7), plus a “correction 
term’? — Sy(7)S,(r). [If (19a) and (19b) 
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were not zero, which could happen—for 
example, if (20) were not satisfied—C,(r) 
would not be identical with the correlation 
function of cos ¢(t); furthermore, the 
expression for R,(r) would contain thell 
two additional terms exhibited in (18).] 

The. “correetion--term’’ is zero in im=~ 
portant cases. Thus, if M(t) — N(t — 7)7 
has a probability density function which — 
is symmetrical about zero, the average of — 
sin [V(t) — N(t — 7)], or Sy(7), is zero, and — 
this causes the correction term to be zero. 
One important case in which this symmetry 
exists is that in which N(¢) is zero-centered 
Gaussian noise. Another is that in which 
N(t) is the result of passing Gaussian noise 
through a symmetrical limiter, or a non- 
linear circuit operating on positive and 
negative amplitudes in a symmetrical 
manner. 

With the correction term zero, 
result becomes 


Ry(7) = 3(cos [N() — N(é — 7)]) 
cos [g() — o(t — 7)] 
= (cos [N() — Nt —a)I) 
cos ¢(t) cos (tf — 7) 
= Cy(7)C,(7), (29) 


where C,(r) is the correlation function of © 
y(t). The quantity Cy(r) can apparently 
be evaluated in a number of cases. In 
particular, one may compute it in straight- 
forward manner, in the Gaussian case, and 
obtain, 


Cx(1) 


our 


l 


(cos [N(t) — N(t — 7)]) 
eri pa) (30) | 


where o? is the mean square value of N(t); 
4.x, 


o° = (NO) 


and o%p(7) is the correlation function of 
N(t); i.e 


o p(t) = (N(t)N(t — 7)). 


SUMMARY 


Thus, the approximate relation (14) may 


be augmented by exact statements, (29) and — 


(22), or (23). Statement (29) says that all 


ht 


functions y(t) [obeying (20)] will be affected — 


in the same way with respect to the cor- 
relation function by the addition of the 
modulating phase N(t). That is, that a 
simple sinusoidal function y(¢) + cos wt 
will have its correlation function multiplied 
by Cy(7+) as would a more complicated 
function y(t) = cos [wt + P(t)]. Further- 
more, since the correlation function of 


PS ear 


y(t) is the sum of the correlation functions — 
of the individual sinusoidal components — 


of y(t), the effect of N(¢) will be the produc- 


tion of a correlation function Ry(r) which — 
is the sum of the individual correlation — 
functions of y(t), each multiplied by Cy(r). — 


This can now be related to (14), which is 
the superposition of sinusoidal waves each 
modulated with the same noise N(t). One 


| 


| 
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can see that the correlation function of 
(14) would be such a sum of correlation 
functions. The derivation of (14), however, 
is contingent upon certain assumptions for 
the depth of modulation or resultant 


spectra, whereas (29) is not dependent 


upon. such. assumptions. 


Since the factor Cy(1) = (cos [N(t) — 


_N(t — 7)]) generally represents (as seen in 


the Gaussian case), a damping effect on 


\the correlation function considered as a 
function of 7 this damping effect will be 
the same whether y(t) is a simple sinusoidal 
| wave or a more complex wave form. 


The effect of N(t) upon the “power 
spectrum’”’ or spectral density function of 
the wave form (1) (which is a series of 
delta functions) is the convolving of each 


of the delta functions with the spectrum 
/associated with the factor Cy(r). That is, 


that the new spectrum [Fourier transform 
of (29)] will be the superposition of 
individual spectra, centered at the fre- 
quencies w/27 + n/T proportional to the 
strength of the original delta functions 
there. 

Thus, the effect of N(t) is to ‘‘smear’’ 
each spectral component of the original 
function (1) in the same manner. 


APPENDIX 


We wish to prove that 


Cena. 

where 

Fa) = ; / Bs Oa ED 
0 


| and where P(¢) is periodic. 


There is no loss of generality if we let 
the period of P(t) be 2x. Then consider 
the quantity F(2m7) where m is an integer: 


FQmn) = sx [cos Beep 
(32) 


Correspondence 


m2" 


ae e@"*PO 4 oe) dt, (33) 


2mar 0 
where “‘c.c.’’ means ‘‘complex conjugate.’’ 
Since P(t) is periodic with period 27, 
e*P(t) also is. Let now e*?() = f(t) and 
consider the mtegral” ~ 


1 ge awt 
o(2mm) = sa f(te*?’ dt. 


We have 


$(2mr) -si{f +f" + ne 


ar eas fe as} 


Li pert Ae 


m 


a6 prea 


1 Qa . 
ree fei! dt 


1 il a Oe 


m i—e 


1wW2T 


1 20 me 
oe, ide" di 


1 1 ae Cones : 
Se One 


™m il ey peas 


Since the maximum possible value of 
1 — e*+27 is 2, we have 


lim ¢(m2r) = 0, 


mo 


if (27) is finite and if 1 — e**7 is not zero. 
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The latter condition will be satisfied if 
w2r ~ n2r or 


aT A —— ; 
7) 


where n is an integer. 
If the period of- P(t) were T, this con- 
dition would be replaced by 


naar 


ee 
o) 


The complex conjugate of ¢(m27) will 
have similar properties, and its limit will 
also be zero. Therefore, since 


lim F(x) = lim F(m2r) + .c¢., 


LO mn 


we have 


lm F(a”) = 0. 

It will be noted that the only condition 
placed on f(¢) beside the periodicity already 
mentioned, is that the integral ¢(x) exists 
for all values of x in the range 0 — 27. 
This is a light restriction and does not 
require f(t) to be continuous. Thus, we 
have proven the fact that the average of 
cos [wt + P(t)] is zero, if T # n27/w with 
no other important restriction on P(t). 
The basis of the present proof is taken 
from Bohr’s “Almost Periodic Functions,’ 
page 51.4 
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